Tell us about your PDF experience.
Azure OpenAI Service Documentation
Learn how to use Azure OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series for content generation, summarization, semantic search, and natural language to code translation.
O V E R V I E W
What is Azure OpenAI Service?
Q U I C K S T A R T
Quickstarts
H O W - T O G U I D E
Create a resource
T U T O R I A L
Embeddings
H O W - T O G U I D E
Completions
T R A I N I N G
Intro to Azure OpenAI training
C O N C E P T
Azure OpenAI Models
R E F E R E N C E
Azure OpenAI
Azure OpenAI Studio Region support Quotas and limits
Apply for access to Azure OpenAI
Video
Combining OpenAI models with the power of Azure
Reference
REST API
Tools
Article • 05/01/2023
Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series. In addition, the new GPT-4 and ChatGPT (gpt-35-turbo) model series are now available in preview. These models can be easily adapted to your specific task including but not limited to content generation, summarization, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio.
Features overview
Feature | Azure OpenAI |
Models available | NEW GPT-4 series (preview) |
GPT-3 base series | |
NEW ChatGPT (gpt-35-turbo) (preview) | |
Codex series | |
Embeddings series | |
Learn more in our Models page. | |
Fine-tuning | Ada |
Babbage | |
Curie | |
Cushman* | |
Davinci* | |
* Currently unavailable. **East US and West Europe Fine-tuning is currently | |
unavailable to new customers. Please use US South Central for US based | |
training | |
Price | |
Virtual network support & private link support | Yes |
Managed Identity | Yes, via Azure Active Directory |
UI experience | Azure Portal for account & resource management, Azure OpenAI Service Studio for model exploration and fine tuning |
Regional availability | East US South Central US West Europe |
Feature Azure OpenAI |
Content filtering Prompts and completions are evaluated against our content policy with automated systems. High severity content will be filtered. |
At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use , building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers.
How do I get access to Azure OpenAI?
Access is currently limited as we navigate high demand, upcoming product improvements, and Microsoft’s commitment to responsible AI . For now, we're working with customers with an existing partnership with Microsoft, lower risk use cases, and those committed to incorporating mitigations.
More specific information is included in the application form. We appreciate your patience as we work to responsibly enable broader access to Azure OpenAI.
Apply here for access:
Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-4, GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.
With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers private networking, regional availability, and responsible AI content filtering.
Prompts & Completions
The completions endpoint is the core component of the API service. This API provides access to the model's text-in, text-out interface. Users simply need to provide an input prompt containing the English text command, and the model will generate a text completion.
Here's an example of a simple prompt and completion:
""" count to 5 in a for loop """
Prompt:
for i in range(1, 6): print(i)
Completion:
Tokens
Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or just chunks of characters. For example, the word “hamburger” gets broken up into the tokens “ham”, “bur” and “ger”, while a short and common word like “pear” is a single token. Many tokens start with a whitespace, for example “ hello” and “ bye”.
The total number of tokens processed in a given request depends on the length of your input, output and request parameters. The quantity of tokens being processed will also affect your response latency and throughput for the models.
Resources
Azure OpenAI is a new product offering on Azure. You can get started with Azure OpenAI the same way as any other Azure product where you create a resource, or instance of the service, in your Azure Subscription. You can read more about Azure's resource management design.
Deployments
Once you create an Azure OpenAI Resource, you must deploy a model before you can start making API calls and generating text. This action can be done using the
Deployment APIs. These APIs allow you to specify the model you wish to use.
In-context learning
The models used by Azure OpenAI use natural language instructions and examples provided during the generation call to identify the task being asked and skill required. When you use this approach, the first part of the prompt includes natural language instructions and/or examples of the specific task desired. The model then completes the task by predicting the most probable next piece of text. This technique is known as "in-context" learning. These models aren't retrained during this step but instead give predictions based on the context you include in the prompt.
There are three main approaches for in-context learning: Few-shot, one-shot and zero-shot. These approaches vary based on the amount of task-specific data that is given to the model:
Few-shot: In this case, a user includes several examples in the call prompt that demonstrate the expected answer format and content. The following example shows a few-shot prompt where we provide multiple examples (the model will generate the last answer):
Convert the questions to a command: Q: Ask Constance if we need some bread. A: send-msg `find constance` Do we need some bread? Q: Send a message to Greg to figure out if things are ready for Wednesday. A: send-msg `find greg` Is everything ready for Wednesday? Q: Ask Ilya if we're still having our meeting this evening. A: send-msg `find ilya` Are we still having a meeting this evening? Q: Contact the ski store and figure out if I can get my skis fixed before I leave on Thursday. A: send-msg `find ski store` Would it be possible to get my skis fixed before I leave on Thursday? Q: Thank Nicolas for lunch. A: send-msg `find nicolas` Thank you for lunch! Q: Tell Constance that I won't be home before 19:30 tonight — unmovable meeting. A: send-msg `find constance` I won't be home before 19:30 tonight. I have a meeting I can't move. Q: Tell John that I need to book an appointment at 10:30. A: |
The number of examples typically range from 0 to 100 depending on how many can fit in the maximum input length for a single prompt. Maximum input length can vary
depending on the specific models you use. Few-shot learning enables a major reduction in the amount of task-specific data required for accurate predictions. This approach will typically perform less accurately than a fine-tuned model.
One-shot: This case is the same as the few-shot approach except only one example is provided.
Zero-shot: In this case, no examples are provided to the model and only the task request is provided.
Models
The service provides users access to several different models. Each model provides a different capability and price point.
GPT-4 models are the latest available models. These models are currently in preview. For access, existing Azure OpenAI customers can apply by filling out this form .
The GPT-3 base models are known as Davinci, Curie, Babbage, and Ada in decreasing order of capability and increasing order of speed.
The Codex series of models is a descendant of GPT-3 and has been trained on both natural language and code to power natural language to code use cases. Learn more about each model on our models concept page.
Learn more about the underlying models that power Azure OpenAI.
Article • 04/27/2023
This article contains a quick reference and a detailed description of the quotas and limits for Azure OpenAI in Azure Cognitive Services.
The following sections provide you with a quick guide to the quotas and limits that apply to the Azure OpenAI:
Limit Name | Limit Value |
OpenAI resources per region per Azure subscription | 3 |
Requests per minute per model* | Davinci-models (002 and later): 120 ChatGPT model (preview): 300 GPT-4 models (preview): 18 All other models: 300 |
Tokens per minute per model* | Davinci-models (002 and later): |
40,000 | |
ChatGPT model: 120,000 | |
GPT-4 8k model: 10,000 | |
GPT-4 32k model: 32,000 | |
All other models: 120,000 | |
Max fine-tuned model deployments* | 2 |
Ability to deploy same model to multiple deployments | Not allowed |
Total number of training jobs per resource | 100 |
Max simultaneous running training jobs per resource | 1 |
Max training jobs queued | 20 |
Max Files per resource | 50 |
Total size of all files per resource | 1 GB |
Max training job time (job will fail if exceeded) | 720 hours |
Max training job size (tokens in training file) x (# of epochs) | 2 Billion |
*The limits are subject to change. We anticipate that you will need higher limits as you move toward production and your solution scales. When you know your solution requirements, please reach out to us by applying for a quota increase here: https://aka.ms/oai/quotaincrease
For information on max tokens for different models, consult the models article
General best practices to mitigate throttling during autoscaling
To minimize issues related to throttling, it's a good idea to use the following techniques: Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually. Test different load increase patterns.
Create another OpenAI service resource in the same or different regions, and distribute the workload among them.
The next sections describe specific cases of adjusting quotas.
How to request increases to the default quotas and limits
At this time, due to overwhelming demand we cannot accept any new resource or quota increase requests.
7 Note
Ensure that you thoroughly assess your current resource utilization, approaching its full capacity. Be aware that we will not grant additional resources if efficient usage of existing resources is not observed.
Learn more about the underlying models that power Azure OpenAI.
Article • 05/11/2023
Azure OpenAI provides access to many different models, grouped by family and capability. A model family typically associates models by their intended task. The following table describes model families currently available in Azure OpenAI. Not all models are available in all regions currently. Refer to the model capability table in this article for a full breakdown.
Model family | Description |
GPT-4 | A set of models that improve on GPT-3.5 and can understand as well as generate natural language and code. These models are currently in preview. |
GPT-3 | A series of models that can understand and generate natural language. This includes the new ChatGPT model (preview). |
Codex | A series of models that can understand and generate code, including translating natural language to code. |
Embeddings A set of models that can understand and use embeddings. An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Currently, we offer three families of Embeddings models for different functionalities: similarity, text search, and code search. | |
Each model family has a series of models that are further distinguished by capability. These capabilities are typically identified by names, and the alphabetical order of these names generally signifies the relative capability and cost of that model within a given model family. For example, GPT-3 models use names such as Ada, Babbage, Curie, and Davinci to indicate relative capability and cost. Davinci is more capable and more expensive than Curie, which in turn is more capable and more expensive than Babbage, and so on.
7 Note
Any task that can be performed by a less capable model like Ada can be performed by a more capable model like Curie or Davinci.
Azure OpenAI model names typically correspond to the following standard naming convention:
{capability}-{family}[-{input-type}]-{identifier}
Element | Description |
{capability} | The model capability of the model. For example, GPT-3 models uses text , while Codex models use code . |
{family} | The relative family of the model. For example, GPT-3 models include ada , babbage , curie , and davinci . |
{input-type} | (Embeddings models only) The input type of the embedding supported by the model. For example, text search embedding models support doc and query . |
{identifier} | The version identifier of the model. |
For example, our most powerful GPT-3 model is called text-davinci-003 , while our most powerful Codex model is called code-davinci-002 .
davinci
The older versions of GPT-3 models named ada , babbage , curie , and that
don't follow the standard naming convention are primarily intended for fine tuning. For more information, see Learn how to customize a model for your application.
You can get a list of models that are available for both inference and fine-tuning by your Azure OpenAI resource by using the Models List API.
We recommend starting with the most capable model in a model family to confirm whether the model capabilities meet your requirements. Then you can stay with that model or move to a model with lower capability and cost, optimizing around that model's capabilities.
GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like gpt-35-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks.
These models are currently in preview. For access, existing Azure OpenAI customers can apply by filling out this form .
gpt-4
gpt-4-32k
gpt-4
The tokens.
supports 8192 max input tokens and the
supports up to 32,768
gpt-4-32k
The GPT-3 models can understand and generate natural language. The service offers four model capabilities, each with different levels of power and speed suitable for different tasks. Davinci is the most capable model, while Ada is the fastest. In the order of greater to lesser capability, the models are:
text-davinci-003
text-curie-001
text-babbage-001
text-ada-001
While Davinci is the most capable, the other models provide significant speed advantages. Our recommendation is for users to start with Davinci while experimenting, because it produces the best results and validate the value that Azure OpenAI can provide. Once you have a prototype working, you can then optimize your model choice with the best latency/performance balance for your application.
Davinci
Davinci is the most capable model and can perform any task the other models can perform, often with less instruction. For applications requiring deep understanding of the content, like summarization for a specific audience and creative content generation, Davinci produces the best results. The increased capabilities provided by Davinci require more compute resources, so Davinci costs more and isn't as fast as other models.
Another area where Davinci excels is in understanding the intent of text. Davinci is excellent at solving many kinds of logic problems and explaining the motives of
characters. Davinci has been able to solve some of the most challenging AI problems involving cause and effect.
Use for: Complex intent, cause and effect, summarization for audience
Curie
Curie is powerful, yet fast. While Davinci is stronger when it comes to analyzing complicated text, Curie is capable for many nuanced tasks like sentiment classification and summarization. Curie is also good at answering questions and performing Q&A and as a general service chatbot.
Use for: Language translation, complex classification, text sentiment, summarization
Babbage
Babbage can perform straightforward tasks like simple classification. It’s also capable when it comes to semantic search, ranking how well documents match up with search queries.
Use for: Moderate classification, semantic search classification
Ada
Ada is usually the fastest model and can perform tasks like parsing text, address correction and certain kinds of classification tasks that don’t require too much nuance. Ada’s performance can often be improved by providing more context.
Use for: Parsing text, simple classification, address correction, keywords
ChatGPT (gpt-35-turbo) (preview)
The ChatGPT model (gpt-35-turbo) is a language model designed for conversational interfaces and the model behaves differently than previous GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT model is conversation-in and message-out. The model expects a prompt string formatted in a specific chat-like transcript format, and returns a completion that represents a model-written message in the chat.
To learn more about the ChatGPT model and how to interact with the Chat API check out our in-depth how-to.
The Codex models are descendants of our base GPT-3 models that can understand and generate code. Their training data contains both natural language and billions of lines of public code from GitHub.
They’re most capable in Python and proficient in over a dozen languages, including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and Shell. In the order of greater to lesser capability, the Codex models are:
code-davinci-002
code-cushman-001
Davinci
Similar to GPT-3, Davinci is the most capable Codex model and can perform any task the other models can perform, often with less instruction. For applications requiring deep understanding of the content, Davinci produces the best results. Greater capabilities require more compute resources, so Davinci costs more and isn't as fast as other models.
Cushman
Cushman is powerful, yet fast. While Davinci is stronger when it comes to analyzing complicated tasks, Cushman is a capable model for many code generation tasks.
Cushman typically runs faster and cheaper than Davinci, as well.
) Important
We strongly recommend using text-embedding-ada-002 (Version 2) . This model/version provides parity with OpenAI's text-embedding-ada-002 . To learn more about the improvements offered by this model, please refer to OpenAI's blog post . Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
Currently, we offer three families of Embeddings models for different functionalities:
Similarity Text search Code search
Each family includes models across a range of capability. The following list indicates the length of the numerical vector returned by the service, based on model capability:
Ada: 1024 dimensions
Babbage: 2048 dimensions
Curie: 4096 dimensions
Davinci: 12288 dimensions
Davinci is the most capable, but is slower and more expensive than the other models. Ada is the least capable, but is both faster and cheaper.
Similarity embedding
These models are good at capturing semantic similarity between two or more pieces of text.
Use cases Models |
Clustering, regression, anomaly detection, visualization text-similarity-ada-001 text-similarity-babbage-001 text-similarity-curie-001 text-similarity-davinci-001 |
Text search embedding
These models help measure whether long documents are relevant to a short search query. There are two input types supported by this family: doc , for embedding the documents to be retrieved, and query , for embedding the search query.
Use cases Models |
Search, context relevance, information retrieval text-search-ada-doc-001 text-search-ada-query-001 text-search-babbage-doc-001 text-search-babbage-query-001 text-search-curie-doc-001 text-search-curie-query-001 text-search-davinci-doc-001 text-search-davinci-query-001 |
Code search embedding
Similar to text search embedding models, there are two input types supported by this family: code , for embedding code snippets to be retrieved, and text , for embedding natural language search queries.
Use cases Models |
Code search and relevance code-search-ada-code-001 code-search-ada-text-001 code-search-babbage-code-001 code-search-babbage-text-001 |
When using our embeddings models, keep in mind their limitations and risks.
) Important
South Central US is temporarily unavailable for creating new resources due to high demand.
GPT-3 Models
These models can be used with Completion API requests.
gpt-35-turbo
is the only
model that can be used with both Completion API requests and the Chat Completion API.
Model ID Base model Regions Fine- Max Training Tuning Request Data (up Regions (tokens) to) |
ada N/A N/A 2,049 Oct 2019 |
text-ada-001 East US, South Central US, N/A 2,049 Oct 2019 West Europe |
babbage N/A N/A 2,049 Oct 2019 |
text-babbage-001 East US, South Central US, N/A 2,049 Oct 2019 West Europe |
curie N/A N/A 2,049 Oct 2019 |
Model ID Base model Regions Fine- Max Training Tuning Request Data (up Regions (tokens) to) |
text-curie-001 East US, South Central US, N/A 2,049 Oct 2019 West Europe |
davinci N/A N/A 2,049 Oct 2019 |
text-davinci-001 South Central US, West Europe N/A |
text-davinci-002 East US, South Central US, N/A 4,097 Jun 2021 West Europe |
text-davinci-003 East US, West Europe N/A 4,097 Jun 2021 |
text-davinci-fine- N/A N/A tune-002 |
gpt-35-turbo1 East US, France Central, South N/A 4,096 Sep 2021 (ChatGPT) Central US, West Europe (preview) |
0301
1 Currently, only version of this model is available. This version of the model will be
deprecated on 8/1/2023 in favor of newer version of the gpt-35-model. See ChatGPT model versioning for more details.
GPT-4 Models
These models can only be used with the Chat Completion API.
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
gpt-4 1,2 (preview) | East US, France Central | N/A | 8,192 | September 2021 |
gpt-4-32k 1,2 (preview) | East US, France Central | N/A | 32,768 | September 2021 |
1 The model is in preview and only available by request .
0314
2 Currently, only version of this model is available.
Codex Models
These models can only be used with Completions API requests.
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
code-cushman-0011 | South Central US, West Europe | Currently unavailable | 2,048 | |
code-davinci-002 | East US, West Europe | N/A | 8,001 | Jun 2021 |
1 The model is available for fine-tuning by request only. Currently we aren't accepting new requests to fine-tune the model.
Embeddings Models
These models can only be used with Embedding API requests.
7 Note
We strongly recommend using text-embedding-ada-002 (Version 2) . This model/version provides parity with OpenAI's text-embedding-ada-002 . To learn more about the improvements offered by this model, please refer to OpenAI's blog post . Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
text-embedding-ada-002 (version 2) | East US, South Central US | N/A | 8,191 | Sep 2021 |
text-embedding-ada-002 (version 1) | East US, South Central US, West Europe | N/A | 4,095 | Sep 2021 |
text-similarity-ada-001 | East US, South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-similarity-babbage-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-similarity-curie-001 | East US, South Central US, West Europe | N/A | 2046 | Aug 2020 |
Model ID | Base model Regions | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
text-similarity-davinci-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-ada-doc-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-ada-query-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-babbage-doc-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-babbage-query-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-curie-doc-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-curie-query-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-davinci-doc-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
text-search-davinci-query-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
code-search-ada-code-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
code-search-ada-text-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
code-search-babbage-code-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
code-search-babbage-text-001 | South Central US, West Europe | N/A | 2,046 | Aug 2020 |
Learn more about Azure OpenAI
Learn more about fine-tuning Azure OpenAI models
Article • 05/11/2023
Azure OpenAI is now available in the France Central region. Check the models page, for the latest information on model availability in each region.
Inactive deployments of customized models will now be deleted after 15 days; models will remain available for redeployment. If a customized (fine-tuned) model is deployed for more than fifteen (15) days during which no completions or
GPT-4 series models are now available in preview on Azure OpenAI. To request access, existing Azure OpenAI customers can apply by filling out this form . These models are currently available in the East US and South Central US regions.
New Chat Completion API for ChatGPT and GPT-4 models released in preview on 3/21. To learn more checkout the updated quickstarts and how-to article.
ChatGPT (gpt-35-turbo) preview. To learn more checkout the how-to article.
Increased training limits for fine-tuning: The max training job size (tokens in training file) x (# of epochs) is 2 Billion tokens for all models. We have also increased the max training job from 120 to 720 hours.
Adding additional use cases to your existing access. Previously, the process for adding new use cases required customers to reapply to the service. Now, we're releasing a new process that allows you to quickly add new use cases to your use of the service. This process follows the established Limited Access process within Azure Cognitive Services. Existing customers can attest to any and all new use cases here . Please note that this is required anytime you would like to use the service for a new use case you did not originally apply for.
New Features
.NET SDK(inference) preview release | Samples
suffix
Terraform SDK update to support Azure OpenAI management operations. Inserting text at the end of a completion is now supported with the parameter.
Updates
Content filtering is on by default.
New articles on:
Monitoring an Azure OpenAI Service Plan and manage costs for Azure OpenAI
New training course:
New Features
Service GA. Azure OpenAI Service is now generally available.
New models: Addition of the latest text model, text-davinci-003 (East US, West Europe), text-ada-embeddings-002 (East US, South Central US, West Europe)
New features
The latest models from OpenAI. Azure OpenAI provides access to all the latest models including the GPT-3.5 series.
New API version (2022-12-01). This update includes several requested enhancements including token usage information in the API response, improved
error messages for files, alignment with OpenAI on fine-tuning creation data structure, and support for the suffix parameter to allow custom naming of fine-tuned jobs.
Higher request per second limits. 50 for non-Davinci models. 20 for Davinci models.
Faster fine-tune deployments. Deploy an Ada and Curie fine-tuned models in under 10 minutes.
Higher training limits: 40M training tokens for Ada, Babbage, and Curie. 10M for Davinci.
Process for requesting modifications to the abuse & miss-use data logging & human review. Today, the service logs request/response data for the purposes of abuse and misuse detection to ensure that these powerful models aren't abused. However, many customers have strict data privacy and security requirements that require greater control over their data. To support these use cases, we're releasing a new process for customers to modify the content filtering policies or turn off the abuse logging for low-risk use cases. This process follows the established Limited Access process within Azure Cognitive Services and existing OpenAI customers can apply here .
Customer managed key (CMK) encryption. CMK provides customers greater control over managing their data in Azure OpenAI by providing their own encryption keys used for storing training data and customized models. Customer-managed keys (CMK), also known as bring your own key (BYOK), offer greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data. Learn more from our encryption at rest documentation.
Lockbox support SOC-2 compliance
Logging and diagnostics through Azure Resource Health, Cost Analysis, and Metrics & Diagnostic settings.
Studio improvements. Numerous usability improvements to the Studio workflow including Azure AD role support to control who in the team has access to create fine-tuned models and deploy.
Changes (breaking)
Fine-tuning create API request has been updated to match OpenAI’s schema.
Preview API versions:
JSON |
{ "training_file": "file-XGinujblHPwGLSztz8cPS8XY", "hyperparams": { "batch_size": 4, "learning_rate_multiplier": 0.1, "n_epochs": 4, "prompt_loss_weight": 0.1, } } |
API version 2022-12-01:
JSON |
{ "training_file": "file-XGinujblHPwGLSztz8cPS8XY", "batch_size": 4, "learning_rate_multiplier": 0.1, "n_epochs": 4, "prompt_loss_weight": 0.1, } |
Content filtering is temporarily off by default. Azure content moderation works differently than OpenAI. Azure OpenAI runs content filters during the generation call to detect harmful or abusive content and filters them from the response. Learn More
These models will be re-enabled in Q1 2023 and be on by default.
Customer actions
Contact Azure Support if you would like these turned on for your subscription. Apply for filtering modifications , if you would like to have them remain off. (This option will be for low-risk use cases only.)
Learn more about the underlying models that power Azure OpenAI.
FAQ
If you can't find answers to your questions in this document, and still need help check the Cognitive Services support options guide. Azure OpenAI is part of Azure Cognitive Services.
Azure OpenAI doesn't use customer data to retrain models. For more information, see the Azure OpenAI data, privacy, and security guide.
Azure OpenAI supports the latest GPT-4 models. These models are currently only available by request. For access, existing Azure OpenAI customers can apply by filling out this form .
Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.
With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI.
Yes, as part of Azure Cognitive Services, Azure OpenAI supports VNETs and Private Endpoints. To learn more consult the Cognitive Services virtual networking guidance
No, GPT-4 is designed by OpenAI to be multimodal, but currently only text input and output are supported.
Previously, the process for adding new use cases required customers to reapply to the service. Now, we're releasing a new process that allows you to quickly add new use cases to your use of the service. This process follows the established Limited Access process within Azure Cognitive Services. Existing customers can attest to any and all new use cases here . Please note that this is required anytime you would like to use the service for a new use case you did not originally apply for.
This error typically occurs when you try to send a batch of text to embed in a single API request as an array. Currently Azure OpenAI does not support batching with embedding requests. Embeddings API calls should consist of a single string input per request. The string can be up to 8191 tokens in length when using the text-embedding-ada-002 (Version 2) model.
Check out our introduction to prompt engineering While these models are extremely powerful, their behavior is also very sensitive to the prompts they receive from the user. This makes prompt construction an important skill to develop. After you've mastered
the introduction, check out our article on more advanced prompt engineering techniques.
Access is currently limited as we navigate high demand, upcoming product improvements, and Microsoft's commitment to responsible AI. For now, we're working with customers with an existing partnership with Microsoft, lower risk use cases, and those committed to incorporating mitigations. In addition to applying for initial access, all solutions using Azure OpenAI are required to go through a use case review before they can be released for production use. Apply here for initial access or for a production review: Apply now
We don't currently provide a timeline for access approval.
For monthly updates, see our what's new page.
Check out our introduction to Azure OpenAI training course.
We recommend posting questions on Microsoft Q&A
Alternatively, you can post questions on Stack Overflow
Azure OpenAI is part of Azure Cognitive Services. You can learn about all the support options for Azure Cognitive Services in the support and help options guide.
Consult the Azure OpenAI model availability guide.
Consult the Azure OpenAI model availability guide for region availability.
100
We don't have a defined API response time Service Level Agreement (SLA) at this time. The overall SLA for Azure OpenAI Service is the same as for other Azure Cognitive Services. For more information, see the Cognitive Services section of the Service Level Agreements (SLA) for Online Services page .
There are currently two different REST APIs that allow model deployment. For the latest model deployment features such as the ability to specify a model version during deployment for models like text-embedding-ada-002 Version 2, use the Cognitive Services Create or Update REST API call.
Azure OpenAI quotas and limits Azure OpenAI what's new Azure OpenAI quickstarts
Article • 05/24/2023
Use this article to get started making your first calls to Azure OpenAI.
An Azure subscription - Create one for free .
Access granted to Azure OpenAI in the desired Azure subscription.
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
An Azure OpenAI resource with a model deployed. For more information about model deployment, see the resource deployment guide.
I ran into an issue with the prerequisites.
Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with credentials that have access to your OpenAI resource. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.
From the Azure OpenAI Studio landing page navigate further to explore examples for prompt completion, manage your deployments and models, and find learning resources such as documentation and community forums.
Go to the Playground for experimentation and fine-tuning workflow.
Start exploring Azure OpenAI capabilities with a no-code approach through the GPT-3 Playground. It's simply a text box where you can submit a prompt to generate a completion. From this page, you can quickly iterate and experiment with the capabilities.
You can select a deployment and choose from a few pre-loaded examples to get started. If your resource doesn't have a deployment, select Create a deployment and follow the instructions provided by the wizard. For more information about model deployment, see the resource deployment guide.
You can experiment with the configuration settings such as temperature and pre-response text to improve the performance of your task. You can read more about each parameter in the REST API.
Selecting the Generate button will send the entered text to the completions API and stream the results back to the text box.
Select the Undo button to undo the prior generation call.
Select the Regenerate button to complete an undo and generation call together.
Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses may be filtered if harmful content is detected. For more information, see the content filter article.
In the GPT-3 playground you can also view Python and curl code samples pre-filled according to your selected settings. Just select View code next to the examples dropdown. You can write an application to complete the same task with the OpenAI Python SDK, curl, or other REST API client.
Try text summarization
To use the Azure OpenAI for text summarization in the GPT-3 Playground, follow these steps:
Sign in to Azure OpenAI Studio .
Select the subscription and OpenAI resource to work with.
Select GPT-3 Playground at the top of the landing page.
Select your deployment from the Deployments dropdown. If your resource doesn't have a deployment, select Create a deployment and then revisit this step.
Select Summarize Text from the Examples dropdown.
Select Generate . Azure OpenAI will attempt to capture the context of text and rephrase it succinctly. You should get a result that resembles the following text:
Tl;dr A neutron star is the collapsed core of a supergiant star. These incredibly dense objects are incredibly fascinating due to their strange properties and their potential for phenomena such as extreme gravitational forces and a strong magnetic field. |
The accuracy of the response can vary per model. The Davinci based model in this example is well-suited to this type of summarization, whereas a Codex based model wouldn't perform as well at this particular task.
I ran into an issue with the playground.
If you want to clean up and remove an OpenAI resource, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.
Learn more about how to generate the best completion in our How-to guide on completions.
For more examples check out the Azure OpenAI Samples GitHub repository .
Article • 05/24/2023
Use this article to get started using Azure OpenAI.
An Azure subscription - Create one for free .
Access granted to Azure OpenAI in the desired Azure subscription.
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
gpt-35-turbo
An Azure OpenAI Service resource with either the or the gpt-4 1
models deployed. For more information about model deployment, see the resource deployment guide.
1 GPT-4 models are currently only available by request. To access these models, existing Azure OpenAI customers can apply for access by filling out this form .
I ran into an issue with the prerequisites.
Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with credentials that have access to your OpenAI resource. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.
From the Azure OpenAI Studio landing page, select Chat playground.
Start exploring OpenAI capabilities with a no-code approach through the Azure OpenAI Studio Chat playground. From this page, you can quickly iterate and experiment with the capabilities.
Assistant setup
You can use the Assistant setup dropdown to select a few pre-loaded System message
examples to get started.
System messages give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, tell it what it should and shouldn't answer, and tell it how to format responses.
Add few-shot examples allows you to provide conversational examples that are used by the model for in-context learning.
At any time while using the Chat playground you can select View code to see Python, curl, and json code samples pre-populated based on your current chat session and settings selections. You can then take this code and write an application to complete the same task you're currently performing with the playground.
Chat session
Selecting the Send button sends the entered text to the completions API and the results are returned back to the text box.
Select the Clear chat button to delete the current conversation history.
Settings
Name | Description |
Deployments | Your deployment name that is associated with a specific model. For ChatGPT, you need to use the gpt-35-turbo model. |
Temperature | Controls randomness. Lowering the temperature means that the model produces more repetitive and deterministic responses. Increasing the temperature results in more unexpected or creative responses. Try adjusting temperature or Top P but not both. |
Max length (tokens) | Set a limit on the number of tokens per model response. The API supports a maximum of 4096 tokens shared between the prompt (including system message, examples, message history, and user query) and the model's response. One token is roughly four characters for typical English text. |
Top probabilities | Similar to temperature, this controls randomness but uses a different method. Lowering Top P narrows the model’s token selection to likelier tokens. Increasing Top P lets the model choose from tokens with both high and low likelihood. Try adjusting temperature or Top P but not both. |
Name Description |
Multi-turn Select the number of past messages to include in each new API request. This conversations helps give the model context for new user queries. Setting this number to 10 results in five user queries and five system responses. |
Stop Stop sequence make the model end its response at a desired point. The model sequences response ends before the specified sequence, so it won't contain the stop sequence text. For ChatGPT, using <|im_end|> ensures that the model response doesn't generate a follow-up user query. You can include as many as four stop sequences. |
Show panels
By default there are three panels: assistant setup, chat session, and settings. Show
panels allows you to add, remove, and rearrange the panels. If you ever close a panel and need to get it back, use Show panels to restore the lost panel.
From the assistant setup drop-down, select Xbox customer support agent
You'll be prompted asking if you want to update the system message, select
Continue.
In the chat session pane, enter the following question: "I'm interested in buying a new Xbox", and select Send.
You'll receive a response similar to:
Enter a follow-up question like: "which models support 4K?"
Now that you have a basic conversation select View code from under Assistant setup and you'll have a replay of the code behind the entire conversation so far:
Python |
#Note: The openai-python library support for Azure OpenAI is in preview. import os import openai openai.api_type = "azure" openai.api_base = "https://docs-test-001.openai.azure.com/" openai.api_version = "2023-05-15" openai.api_key = os.getenv("OPENAI_API_KEY") response = openai.ChatCompletion.create( engine="gpt-35-turbo", # replace this value with the deployment name you chose when you deployed the associated model. messages = [{"role":"system","content":"You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."},{"role":"user","content":"How much is a PS5?"}, {"role":"assistant","content":"I apologize, but I do not have information about the prices of other gaming devices such as the PS5. My primary focus is to assist with issues regarding Xbox devices. Is there a specific issue you are having with your Xbox device that I may be able to help with?"}, {"role":"user","content":"I am interested in buying an Xbox?"}, {"role":"assistant","content":"Great! There are several Xbox models available in the market, including the Xbox Series X, Xbox Series S, and Xbox One. The Xbox Series X is the latest and most powerful console, while the Xbox Series S is a more affordable option that still offers great performance. The Xbox One is the previous generation console, but it still has a large library of games available to play.\n\nBefore purchasing an Xbox, you may want to consider factors such as your budget, the types of games you want to play, and whether you want to play games in 4K resolution. Once you have decided on a console, you can purchase it from a retailer or directly from the Xbox website. Let me know if you have any other questions or concerns!"},{"role":"user","content":"Which models are 4K?"}, {"role":"assistant","content":"Both the Xbox Series X and Xbox One X support 4K resolution gaming. The Xbox Series S, on the other hand, supports up to 1440p resolution gaming, but it can upscale games to 4K. It's worth noting that in order to take advantage of 4K gaming, you'll need a 4K TV or monitor that supports HDR (High Dynamic Range) and HDMI 2.0 or higher."}], temperature=0, |
max_tokens=350, top_p=0.95, frequency_penalty=0, presence_penalty=0, stop=None) |
I ran into an issue with the playground.
Understanding the prompt structure
If you examine the sample from View code you'll notice some unique tokens that weren't part of a typical GPT completion call. ChatGPT was trained to use special tokens to delineate different parts of the prompt. Content is provided to the model in between
<|im_start|>
<|im_end|>
and tokens. The prompt begins with a system message that
can be used to prime the model by including context or instructions for the model. After that, the prompt contains a series of messages between the user and the assistant.
The assistant's response to the prompt will then be returned below the
<|im_start|>assistant
<|im_end|>
token and will end with denoting that the assistant
has finished its response. You can also use the Show raw syntax toggle button to display these tokens within the chat session panel.
The ChatGPT how-to guide provides an in-depth introduction into the new prompt
gpt-35-turbo
structure and how to use the model effectively.
Once you're done testing out the Chat playground, if you want to clean up and remove an OpenAI resource, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.
gpt-35-turbo
Learn more about how to work with ChatGPT and the new with the ChatGPT how-to guide.
model
For more examples check out the Azure OpenAI Samples GitHub repository
Article • 04/14/2023
Azure OpenAI Service includes a content management system that works alongside core models to filter content. This system works by running both the input prompt and generated content through an ensemble of classification models aimed at detecting misuse. If the system identifies harmful content, you'll receive either an error on the API call if the prompt was deemed inappropriate or the finish_reason on the response will
content_filter
be to signify that some of the generation was filtered. You can generate
content with the completions API using many different configurations that will alter the filtering behavior you should expect. The following section aims to enumerate all of these scenarios for you to appropriately design your solution.
To ensure you have properly mitigated risks in your application, you should evaluate all potential harms carefully, follow guidance in the Transparency Note and add scenario-specific mitigation as needed.
When building your application, you'll want to account for scenarios where the content returned by the Completions API is filtered and content may not be complete. How you act on this information will be application specific. The behavior can be summarized in the following key points:
Prompts that are deemed inappropriate will return an HTTP 400 error
Non-streaming completions calls won't return any content when the content is
finish_reason
filtered. The value will be set to content_filter . In rare cases with
finish_reason
long responses, a partial result can be returned. In these cases, the will be updated.
For streaming completions calls, segments will be returned back to the user as
they're completed. The service will continue streaming until either reaching a stop token, length or harmful content is detected.
Scenario: You send a non-streaming completions call asking for multiple generations with no inappropriate content
The table below outlines the various ways content filtering can appear:
HTTP Response behavior response code |
200 In the cases when all generation passes the filter models no content moderation details are added to the response. The finish_reason for each generation will be either stop or length. |
Example request payload:
JSON | |
{ | |
"prompt":"Text example", | |
"n": 3, | |
"stream": false | |
} | |
Example response JSON:
JSON |
{ "id": "example-id", "object": "text_completion", "created": 1653666286, "model": "davinci", "choices": [ { "text": "Response generated text", "index": 0, "finish_reason": "stop", "logprobs": null } ] } |
Scenario: Your API call asks for multiple responses (N>1) and at least 1 of the responses is filtered
HTTP Response Response behavior Code |
200 The generations that were filtered will have a finish_reason value of 'content_filter'. |
Example request payload:
JSON | |
{ | |
"prompt":"Text example", | |
"n": 3, | |
"stream": false | |
} | |
Example response JSON:
JSON |
{ "id": "example", "object": "text_completion", "created": 1653666831, "model": "ada", "choices": [ { "text": "returned text 1", "index": 0, "finish_reason": "length", "logprobs": null }, { "text": "returned text 2", "index": 1, "finish_reason": "content_filter", "logprobs": null } ] } |
Scenario: An inappropriate input prompt is sent to the completions API (either for streaming or non-streaming)
HTTP Response behavior Response Code |
400 The API call will fail when the prompt triggers one of our content policy models. Modify the prompt and try again. |
Example request payload:
JSON
{ "prompt":"Content that triggered the filtering model" } |
Example response JSON:
JSON |
"error": { "message": "The response was filtered", "type": null, "param": "prompt", "code": "content_filter", "status": 400 } |
Scenario: You make a streaming completions call with all generated content passing the content filters
HTTP Response behavior Response Code |
200 In this case, the call will stream back with the full generation and finish_reason will be either 'length' or 'stop' for each generated response. |
Example request payload:
JSON | |
{ | |
"prompt":"Text example", | |
"n": 3, | |
"stream": true | |
} | |
Example response JSON:
JSON | |
{ | |
"id": "cmpl-example", | |
"object": "text_completion", | |
"created": 1653670914, | |
"model": "ada", | |
"choices": [ | |
{ "text": "last part of generation", "index": 2, "finish_reason": "stop", "logprobs": null } ] } |
Scenario: You make a streaming completions call asking for multiple generated responses and at least one response is filtered
HTTP Response behavior Response Code |
200 For a given generation index, the last chunk of the generation will include a non-null finish_reason value. The value will be 'content_filter' when the generation was filtered. |
Example request payload:
JSON | |
{ | |
"prompt":"Text example", | |
"n": 3, | |
"stream": true | |
} | |
Example response JSON:
JSON |
{ "id": "cmpl-example", "object": "text_completion", "created": 1653670515, "model": "ada", "choices": [ { "text": "Last part of generated text streamed back", "index": 2, "finish_reason": "content_filter", "logprobs": null } |
] } |
Scenario: Content filtering system doesn't run on the generation
HTTP Response behavior Response Code |
200 If the content filtering system is down or otherwise unable to complete the operation in time, your request will still complete. You can determine that the filtering wasn't applied by looking for an error message in the "content_filter_result" object. |
Example request payload:
JSON | |
{ | |
"prompt":"Text example", | |
"n": 1, | |
"stream": false | |
} | |
Example response JSON:
JSON |
{ "id": "cmpl-example", "object": "text_completion", "created": 1652294703, "model": "ada", "choices": [ { "text": "generated text", "index": 0, "finish_reason": "length", "logprobs": null, "content_filter_result": { "error": { "code": "content_filter_error", "message": "The contents are not filtered" } } } ] } |
As part of your application design you'll need to think carefully on how to maximize the benefits of your applications while minimizing the harms. Consider the following best practices:
How you want to handle scenarios where your users send in-appropriate or miss-use your application. Check the finish_reason to see if the generation is filtered.
error
If it's critical that the content filters run on your generations, check that there's no object in the content_filter_result .
user
user
To help with monitoring for possible misuse, applications serving multiple end-
users should pass the
parameter with each API call. The
should be a
unique identifier for the end-user. Don't send any actual user identifiable information as the value.
Learn more about the underlying models that power Azure OpenAI.
Article • 05/10/2023
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.
Different Azure OpenAI embedding models are specifically created to be good at a particular task. Similarity embeddings are good at capturing semantic similarity between two or more pieces of text. Text search embeddings help measure whether long documents are relevant to a short query. Code search embeddings are useful for embedding code snippets and embedding natural language search queries.
Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, we can use embeddings to determine if two text chunks are semantically related or similar, and provide a score to assess similarity.
Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.
An alternative method of identifying similar documents is to count the number of common words between documents. Unfortunately, this approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words
detected even among completely disparate topics. For this reason, cosine similarity can offer a more effective alternative.
Learn more about using Azure OpenAI and embeddings to perform document search with our embeddings tutorial.
Article • 05/19/2023
The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. With the rise of LLMs, the term has extended beyond traditional cybersecurity and evolved in common usage to describe many kinds of probing, testing, and attacking of AI systems. With LLMs, both benign and adversarial usage can produce potentially harmful outputs, which can take many forms, including harmful content such as hate speech, incitement or glorification of violence, or sexual content.
Red teaming is an essential practice in the responsible development of systems and features using LLMs. While not a replacement for systematic measurement and mitigation work, red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.
Microsoft has conducted red teaming exercises and implemented safety systems (including content filters and other mitigation strategies) for its Azure OpenAI Service models (see this Responsible AI Overview). However, the context of your LLM application will be unique and you also should conduct red teaming to:
Test the LLM base model and determine whether there are gaps in the existing safety systems, given the context of your application system.
Identify and mitigate shortcomings in the existing default filters or mitigation strategies.
Provide feedback on failures so we can make improvements.
Here is how you can get started in your process of red teaming LLMs. Advance planning is critical to a productive red teaming exercise.
Managing your red team
Assemble a diverse group of red teamers.
LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. For example, if you’re designing a chatbot to help health care providers, medical experts can help identify risks in that domain.
Recruit red teamers with both benign and adversarial mindsets.
Having red teamers with an adversarial mindset and security-testing experience is essential for understanding security risks, but red teamers who are ordinary users of your application system and haven’t been involved in its development can bring valuable perspectives on harms that regular users might encounter.
Remember that handling potentially harmful content can be mentally taxing.
You will need to take care of your red teamers, not only by limiting the amount of time they spend on an assignment, but also by letting them know they can opt out at any time. Also, avoid burnout by switching red teamers’ assignments to different focus areas.
Planning your red teaming
Where to test
Because a system is developed using a LLM base model, you may need to test at several different layers:
The LLM base model with its safety system in place to identify any gaps that may need to be addressed in the context of your application system. (Testing is usually through an API endpoint.)
Your application system. (Testing is usually through a UI.)
Both the LLM base model and your application system before and after mitigations are in place.
How to test
Consider conducting iterative red teaming in at least two phases:
Open-ended red teaming, where red teamers are encouraged to discover a variety of harms. This can help you develop a taxonomy of harms to guide further testing. Note that developing a taxonomy of undesired LLM outputs for your application system is crucial to being able to measure the success of specific mitigation efforts.
Guided red teaming, where red teamers are assigned to focus on specific harms
listed in the taxonomy while staying alert for any new harms that may emerge. Red teamers can also be instructed to focus testing on specific features of a system for surfacing potential harms.
Be sure to:
Provide your red teamers with clear instructions for what harms or system features they will be testing.
Give your red teamers a place for recording their findings. For example, this could be a simple spreadsheet specifying the types of data that red teamers should provide, including basics such as:
The type of harm that was surfaced.
The input prompt that triggered the output. An excerpt from the problematic output.
Comments about why the red teamer considered the output problematic.
Maximize the effort of responsible AI red teamers who have expertise for testing specific types of harms or undesired outputs. For example, have security subject matter experts focus on jailbreaks, metaprompt extraction, and content related to aiding cyberattacks.
Reporting red teaming findings
You will want to summarize and report red teaming top findings at regular intervals to key stakeholders, including teams involved in the measurement and mitigation of LLM failures so that the findings can inform critical decision making and prioritizations.
Learn about other mitigation strategies like prompt engineering
Article • 03/21/2023
GPT-3, GPT-3.5, and GPT-4 models from OpenAI are prompt-based. With prompt-based models, the user interacts with the model by entering a text prompt, to which the model responds with a text completion. This completion is the model’s continuation of the input text.
While these models are extremely powerful, their behavior is also very sensitive to the prompt. This makes prompt construction an important skill to develop.
Prompt construction can be difficult. In practice, the prompt acts to configure the model weights to complete the desired task, but it's more of an art than a science, often requiring experience and intuition to craft a successful prompt. The goal of this article is to help get you started with this learning process. It attempts to capture general concepts and patterns that apply to all GPT models. However it's important to understand that each model behaves differently, so the learnings may not apply equally to all models.
This section covers the basic concepts and elements of GPT prompts.
Text prompts are how users interact with GPT models. As with all generative language models, GPT models attempt to produce the next series of words that are most likely to follow from the previous text. It's as if we're saying What is the first thing that comes to your mind when I say <prompt> ?”. The examples below demonstrate this behavior. Given the first words of famous content, the model is able to accurately continue the text.
Prompt Completion
Four score
and seven years ago our
"Call me Ishmael."
fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. […]
"Some years ago—never mind how long precisely—having little o no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world." […]
As you develop more complex prompts, it's helpful to keep this fundamental behavior in mind. Regardless of the prompt that is provided, the model is simply responding with what it determines is most likely (given its training data and training targets). If you ask
a question in your prompt, the model isn't following a separate “Q&A” code path, but rather it appears to answer the question because an answer is the most likely sort of response for the given question as input.
Prompt components
When using the Completion API while there's no differentiation between different parts of the prompt, it can still be useful for learning and discussion to identify underlying prompt components. With the Chat Completion API there are distinct sections of the prompt that are sent to the API in the form of an array of dictionaries with associated roles: system, user, and assistant. This guidance will focus more generally on how to think about prompt construction rather than providing prescriptive guidance that is specific to one API over another.
It's also important to understand that while there could be other valid ways to dissect prompts, the goal of this breakdown is to provide a relatively simple way to think about prompt construction. With the Completion API, all the components are optional, but at least one must be present and most prompts include more than one component. There can be some grey area between components as well. The order presented below roughly corresponds to how commonly each component is used, from most to least.
Instructions
Instructions are likely the most commonly used prompt component. Instructions are simply that: instructions to the model on what to do. While simple in concept, they can be complex in practice. The following table illustrates this by showing both simple and more complex instructions, in two different forms.
InstructionPrompt Completion Type
Simple
Welcome to our weekly newsletter. This week we
Write an introduction forhave a special focus on productivity. We'll be
instructionsa weekly newsletter.
sharing tips and tricks on how to be productive in all areas of your life. We hope you enjoy and please share with your friends!
InstructionPrompt Completion Type
Write an introduction for Welcome to Contoso's weekly newsletter! We had a
Complex
a weekly newsletter to my company named Contoso. It should mention the great all
great all hands meeting last week and I want to thank the team for all their hard work over the tough past few months. We're looking forward to a
bright future and I'm confident that with our team's
instructionshands meeting we had
last week and thank the team for all their hard work over the tough past few months.
Write an introduction for
a weekly newsletter, including the following:
dedication and commitment, we will achieve great things. Thank you for being a part of our team and I look forward to working with you all in the coming weeks and months. Sincerely, [Your name].
Hello Contoso team! I wanted to take a moment to
- My company is named thank you for all your hard work over the past few
Complex
contoso
- An exuberant greeting
months. It's been a tough couple of months, but you've all pulled together and done an amazing
Instructions- Thanking the team for
their hard work over a couple tough months
job. I'm really proud of you all. Looking ahead to the next quarter, I'm feeling really positive. I think we're going to achieve some great things. Thanks
Positive outlook for the again, The Senior Leadership Team. coming quarter
Signed by the SLT
Primary Content
Primary content refers to some sort of text that is being processed or transformed by the model. Primary content is typically used in conjunction with instructions. A simple example would be language translation. In the example below, the English text would be considered the primary content while “Translate to French:” is an instruction.
Prompt Completion
Pouvez-vous s'il vous plaît me dire comment
aller au musée?
Can you please tell me how to get to the museum?
Translate to French:
Primary content can also be much longer. In the following example the primary content is the introduction section from the Wikipedia entry on time complexity , which is almost 300 words long. This content has been abbreviated in our table for ease of display.
Prompt Completion
In computer science, the time complexity is the
computational complexity that describes the amount of computer time it takes to run
Time complexity is a measure of how long an algorithm takes to run. It is typically expressed as a function of the size of the input. The most common measure is worst-
an algorithm. Time complexitycase time complexity, which is the maximum amount of is commonly estimated by [...] time required for inputs of a given size.
Summarize the previous text in a few sentences:
In computer science, the time The time complexity of an algorithm is the amount of
complexity is the computational complexity that describes the amount of computer time it takes to run
time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm,
supposing that each elementary operation takes a fixed
an algorithm. Time complexityamount of time to perform. The time complexity is
is commonly estimated by [...] generally expressed as a function of the size of the input. TL;DR (Too long didn't read)
GPT models can also handle primary content that is structured. In the example below, a TSV about beers in Belgium (abbreviated for display in this article) is passed to the model as part of the prompt. It is capable of properly interpreting the content and answering questions about the data.
Prompt Completion
Belgian Beer Brands
Beer name Beer style color ABV Brewery
"Chimay Gold: ""Doree""" trappist pale ale gold
4.80% Chimay Brewery (official trappist)
"Chimay Blue: ""Grande Reserve""" trappist dark ale
dark 9.00%
There are 10 beers in the table that are less than 6% ABV.
[…]
Referencing the table above, how many beers are less than 6% ABV?
Examples
Successful prompts often rely on the practice of “one-shot” or “few-shot” learning. This refers to the inclusion of one or more examples of the desired behavior of the model, typically by including input and output pairs. This is not learning in the sense that the model is permanently changed, but rather that the examples better condition the model
to respond as desired for only the current inference. The use of prompts with no examples is sometimes referred to as “zero-shot” learning. Please note that with the Chat Completion API few-shot learning examples are typically added to the messages array in the form of example user/assistant interactions after the initial system message.
LearningPrompt Completion Type
Zero-shot
Headline: Coach confident injury won't derail Warriors
Topic:
Headline: Twins' Correa to use opt-out, test free agency
Topic: Baseball Headline: Qatar World Cup to have zones for sobering up
The coach is confident that the injury won't derail the Warriors' season. The team is still focused on their goals and that they will continue to work hard to achieve them.
Few-shotTopic: Soccer
Headline: Yates: Fantasy football intel for Week 6 Topic: Football Headline: Coach confident injury won't derail Warriors
Topic:
Basketball
The example above illustrates the utility of few-shot learning. Without the examples, the model seems to be guessing at the desired behavior, while the examples cleanly show the model how to operate. This also demonstrates the power of the model, it can infer the category of label that is wanted, even without a ‘basketball’ label in the examples.
Cue
Cues act as the “jumpstart” for the output of the model, helping to direct the model to the desired output. It is often a prefix that the model can build onto. In the example
below, we show the cue directing the story to include key elements. Cues are often used along with instructions, but not always. In the examples below, the model is being used to summarize a long announcement email. A cue is used in one case to help focus the output of the model and to suggest a specific output format (bullet points) in the other.
In the context of the Chat Completion API instructions take the form of the system message, and few-shot examples are indicated by providing a series of example user/assistant interactions to help prime the model.
NumberPrompt Completion of Cues
Performance reviews are a core component of our
Performance & Development
approach at Contoso. […] Summarize the above email message: Performance reviews are a core component of our Performance & Development
approach at Contoso. […] Summarize the above email message:
The key takeaway of the message is
The purpose of this email is to inform employees of recent changes to the performance review experience, which is part of Contoso's Performance & Development approach. The changes include an improved structure, upgraded in-tool user experience, and focused prompts. Additionally, there is now an optional Conversation Starters section for employees to flag topics they'd like to discuss with their managers.
that Contoso is improving the performance review experience by making it more focused on core priorities, expectations and impact. Additionally, they are upgrading the in-tool user experience with tips, resources and guidance. Finally, they are adding guidance on how to activate company-wide core priorities.
NumberPrompt Completion of Cues
Performance reviews are a core component of our Performance &
Development approach at
Contoso. […] Summarize the above email message:
Key Points:
•
Contoso is introducing changes to the performance review experience, based on feedback from employees.
The changes include a new focus on core priorities, expectations, and impact; an upgraded user experience; and new guidance on company-wide core priorities.
Performance reviews are a time to reflect on past performance and learning as well as to plan for future priorities and growth opportunities.
Employees and managers should write comments in advance of the performance review meeting.
There should be at least two performance reviews per year.
The time investment for each review should remain about the same.
Supporting content
Supporting content is information that the model can utilize to influence the output in some way. It differs from primary content in that it is not the main target of task, but it is typically used along with primary content. Common examples include contextual information such as the current date, the name of the user, user preferences, and so on. The example below uses supporting content to help organize a set of planned workshops for the user. Without the supporting (important topics) the model simply lists off the workshops (truncated for display), when informed of my important topics, the model is able to accurately group the meetings.
SupportingPrompt Completion
Content
SupportingPrompt Completion
Content
Without
Workshop list:
Title Description
Planning Workshop Kickoff Please join us to kick off our workshop! Hear from our leaders about cross group priorities and learnings for the year ahead. The invitation to this event has already been sent to everyone in […]
Summarize the workshops above, grouped by topic:
Workshop list: Planning Workshop Kickoff
-Planning Workshop Wrap-up
Core MeetUp
Customer Driven Perspectives
-Impact and Key Focus Areas
-Best practices
Accelerating app adoption […]
SupportingPrompt Completion
Content
With
Workshop list:
Title Description
Planning Workshop Kickoff Please join us to kick off our workshop! Hear from our leaders about cross group priorities and learnings for the year ahead. The invitation to this event has already been sent to everyone in […]
My Important Topics: prompt engineering, search, GPT models
Summarize the workshops above, grouped by my important topics:
Prompt Engineering:
New prompt engineering tips
The art of
prompt engineering
Search:
-Intro to vector search with embedding
GPT
Models:
Intro to GPT-4
ChatGPT in-depth.
Be Specific. Leave as little to interpretation as possible. Restrict the operational space.
Be Descriptive. Use analogies.
Double Down. Sometimes you may need to repeat yourself to the model. Give instructions before and after your primary content, use an instruction and a cue, etc.
Order Matters. The order in which you present information to the model may impact the output. Whether you put instructions before your content (“summarize the following…”) or after (“summarize the above…”) can make a difference in output. Even the order of few-shot examples can matter. This is referred to as recency bias.
Give the model an “out”. It can sometimes be helpful to give the model an alternative path if it is unable to complete the assigned task. For example, when asking a question over a piece of text you might include something like "respond
with ‘not found’ if the answer is not present". This can help the model avoid generating false responses.
While the input size increases with each new generation of GPT models, there will continue to be scenarios that provide more data than the model can handle. GPT models break words into “tokens”. While common multi-syllable words are often a single token, less common words are broken in syllables. Tokens can sometimes be counter-intuitive, as shown by the example below which demonstrates token boundaries for different date formats. In this case, spelling out the entire month is more space efficient than a fully numeric date. The current range of token support goes from 2000 tokens with earlier GPT-3 models to up to 32,768 tokens with the 32k version of the latest GPT-4 model.
Given this limited space, it is important to use it as efficiently as possible.
Tables – As shown in the examples in the previous section, GPT models can understand tabular formatted data quite easily. This can be a space efficient way to include data, rather than preceding every field with name (such as with JSON).
White Space – Consecutive whitespaces are treated as separate tokens which can be an easy way to waste space. Spaces preceding a word, on the other hand, are typically treated as part of the same token as the word. Carefully watch your usage of whitespace and don’t use punctuation when a space alone will do.
Learn more about Azure OpenAI
Article • 04/23/2023
This guide will walk you through some advanced techniques in prompt design and prompt engineering. If you're new to prompt engineering, we recommend starting with our introduction to prompt engineering guide.
While the principles of prompt engineering can be generalized across many different model types, certain models expect a specialized prompt structure. For Azure OpenAI GPT models, there are currently two distinct APIs where prompt engineering comes into play:
Chat Completion API. Completion API.
Each API requires input data to be formatted differently, which in turn impacts overall prompt design. The Chat Completion API supports the ChatGPT and GPT-4 models. These models are designed to take input formatted in a specific chat-like transcript stored inside an array of dictionaries.
The Completion API supports the older GPT-3 models and has much more flexible input requirements in that it takes a string of text with no specific format rules. Technically the ChatGPT models can be used with either APIs, but we strongly recommend using the Chat Completion API for these models. To learn more, please consult our in-depth guide on using these APIs.
The techniques in this guide will teach you strategies for increasing the accuracy and grounding of responses you generate with a Large Language Model (LLM). It is, however, important to remember that even when using prompt engineering effectively you still need to validate the responses the models generate. Just because a carefully crafted prompt worked well for a particular scenario doesn't necessarily mean it will generalize more broadly to certain use cases. Understanding the limitations of LLMs, is just as important as understanding how to leverage their strengths.
This guide does not go in-depth into the mechanics behind the message structure for Chat Completions. If you aren't familiar with interacting with the ChatGPT and GPT-4 models programmatically we recommend reading our how-to guide on the Chat Completion API first.
7 Note |
The system message is included at the beginning of the prompt and is used to prime the model with context, instructions, or other information relevant to your use case. You can use the system message to describe the assistant’s personality, define what the model should and shouldn’t answer, and define the format of model responses.
The example below, shows a sample system message and the resulting model response:
System message | User | Assistant |
You are an AI assistant that helps people find information | What can you | Dear John, I'm |
and responds in rhyme. If the user asks you a question you | tell about me, | sorry to say, |
don't know the answer to, say so. | John Doe? | But I don't have |
info on you today. | ||
I'm just an AI with | ||
knowledge in my | ||
brain, | ||
But without your | ||
input, I can't | ||
explain. | ||
So please tell me | ||
more about what | ||
you seek, | ||
And I'll do my best | ||
to give you an | ||
answer unique. |
Some other examples of system messages are:
“Assistant is a large language model trained by OpenAI.”
“Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Service. Only answer questions using the context below and if you're not sure of an answer, you can say "I don't know".
“Assistant is an intelligent chatbot designed to help users answer their tax related questions.”
“You are an assistant designed to extract entities from text. Users will paste in a string of text and you will respond with entities you've extracted from the text as a
JSON object. Here's an example of your output format:
JSON | |
{ | |
"name": "", | |
"company": "", | |
"phone_number": "" | |
} | |
An important detail to understand is that even if you instruct a model in the system message to answer I don't know when unsure of an answer this does not guarantee that the request will be honored. A well designed system message can increase the likelihood of a certain outcome, but it is still possible that an incorrect response could be generated that contradicts the intent of the instruction in the system message.
A common way to adapt language models to new tasks is to use few-shot learning. In few-shot learning, a set of training examples is provided as part of the prompt to give additional context to the model.
When using the Chat Completions API, a series of messages between the User and Assistant (written in the new prompt format), can serve as examples for few-shot learning. These examples can be used to prime the model to respond in a certain way, emulate particular behaviors, and seed answers to common questions.
System message | Few-shot examples |
Assistant is an intelligent chatbot | User - "When do I need to file my taxes by?" |
designed to help users answer | Assistant - "In 2023, you will need to file your taxes by April |
their tax related questions. | 18th. The date falls after the usual April 15th deadline |
Instructions | because April 15th falls on a Saturday in 2023. For more |
- Only answer questions related to | details, see https://www.irs.gov/filing/individuals/when-to- |
taxes. | |
- If you're unsure of an answer, you | |
can say "I don't know" or "I'm not | User - "How can I check the status of my tax refund?" |
sure" and recommend users go to | Assistant - "You can check the status of your tax refund by |
the IRS website for more | visiting https://www.irs.gov/refunds " |
information. |
The table above describes the basic prompt structure, but for a detailed explanation of the exact prompt format you will need to refer to the Chat Completion guide.
While the Chat Completion API is optimized to work with multi-turn conversations, it also can be used for non chat scenarios. For example, for a sentiment analysis scenario, you might use the following prompt:
System message | User | Assistant |
You are an | hi thank you for calling contoso pharmacy who am i | Sentiment rating: 8 |
assistant | speaking with today i understand your frustration mary um | The speaker in this |
designed to | can you tell me what exactly you're trying to accomplish | text seems to be |
analyze | OK uh certainly happy to check that for you one moment | polite, helpful, and |
sentiment | please i see here that you were on a generic form of lipitor | patient while |
from speech | is that right OK uh so i see that your doctor stopped | addressing the |
data. Users | prescribing these drugs in twenty twenty one actually OK | concerns of the |
will paste in | uh yeah i'd be happy to check that for you uh because | customer. They take |
a string of | sometimes there's a gap in the system and it just doesn't | the time to check |
text and you | get reported so let me take a look here just a moment so | the information, |
will respond | i'm seeing here that your doctor had these drugs | provide clear |
with an | prescribed to you from twenty twelve through twenty | explanations, and |
assessment | twenty one OK um yeah according to the latest records uh | offer assistance. |
of the | provided by doctor's office you're now on a different drug | However, the rating |
response of | metformin would you like us to go ahead and fill that | is not a 10 because |
the speaker. | prescription for you for pick up a bit of form in five | there is still a hint of |
Rate on a | hundred milligrams uh you're very welcome mary please let | uncertainty in some |
scale of 1-10 | us know if there's anything else we can do for you today. | parts of the |
(10 being | conversation, which | |
highest). | may not convey | |
Explain why | complete | |
this rating | confidence to the | |
was given. | customer. |
The sequence in which information appears in the prompt matters. This is because GPT style models are built in a certain way that defines how they process the input. Our research suggests that telling the model the task you want it to do at the beginning of the prompt, before sharing additional contextual information or examples, can help produce higher-quality outputs.
7 Note
Although following this technique is still generally recommended, in contrast to previous model versions (GPT-3 and prior), our testing showed that the model
response with ChatGPT and GPT-4 models was the same regardless of whether the technique is utilized. In the case of the below example, we see that adding the statement “Several sources … eruption” at the beginning or the end of the prompt did not cause any change in the final model response.
System User Assistant message |
You are an Your task is to verify if the No, the statement is not directly implied or AI assistant statement "Several sources stated by the snippets. The snippets mention a that helps mention a chance of another chance of a "megaquake" and a "magnitude 9 people find large eruption" is supported by a Cascadia earthquake" hitting Seattle in the information. specific quote from the following next 50 years, but do not mention a chance of set of snippets. another large eruption. ---SNIPPETS [1] 14 percent chance of megaquake hitting Seattle, experts say SEATTLE - There's a 14 percent chance of a magnitude 9 Cascadia earthquake hitting Seattle in the next 50 years, the U.S. Geological Survey estimates. "Unfortunately, we are unable to... [2] Earthquake experts lay out latest outlook for Seattle's 'Really Big One’ “We say that there's approximately a 14% chance of another approximately magnitude-9 earthquake occurring in the next 50 years,” said a geophysicist at the University of Washington... --- Is the statement "Several sources mention a chance of another large eruption" directly implied or stated by the snippets? |
Models can be susceptible to recency bias, which in this context means that information at the end of the prompt might have more significant influence over the output than information at the beginning of the prompt. Therefore, it is worth experimenting with repeating the instructions at the end of the prompt and evaluating the impact on the generated response.
“Here’s
This refers to including a few words or phrases at the end of the prompt to obtain a model response that follows the desired form. For example, using a cue such as
a bulleted list of key points:\n- ”
can help make sure the output is formatted as a
list of bullet points.
System message | User | Assistant |
You are an AI assistant | John Smith is married to Lucy Smith. They have five | "John Smith |
that helps people find | kids, and he works as a software engineer at | married Lucy |
information. Answer in | Microsoft. What search queries should I do to fact- | Smith five kids |
as few words as | check this? ## One possible search query is: | software |
possible. | engineer | |
Microsoft" |
In the above prompt, the text One possible search query is: primes the model to produce a single output. Without this cue the model produces several search queries as output.
Using clear syntax for your prompt—including punctuation, headings, and section markers—helps communicate intent and often makes outputs easier to parse.
In the example below, separators ( --- in this case) have been added between different
---
sources of information or steps. This allows the use of as a stopping condition for
generation. In addition, section headings or special variables are presented in uppercase to differentiate them.
System User Assistant message |
System message | User | Assistant |
You are an AI | You will read a paragraph, and then issue queries | 1. "John Smith Microsoft" |
assistant that | to a search engine in order to fact-check it. Also | - To check if John Smith |
helps people | explain the queries. | is indeed employed at |
find information. | --- | Microsoft. |
PARAGRAPH | 2. "John Smith Lucy | |
John Smith is married to Lucy Smith. They have | Smith" - To check if John | |
five kids, and he works as a software engineer at | Smith is married to Lucy | |
Microsoft. What search queries should I do to | Smith. | |
fact-check this? | 3. "John Smith children" - | |
--- | To check if John Smith | |
QUERIES | has five children. |
If you're not sure what syntax to use, consider using Markdown or XML. The models have been trained on a lot of web content in XML and Markdown which may provide better results.
Large language models (LLMs) often perform better if the task is broken down into smaller steps. For example, in the search query prompt referenced earlier, the prompt can be restructured so that the model is first instructed to extract relevant facts, and then instructed to generate search queries that can be used to verify those facts.
System message User Assistant |
System message | User | Assistant |
You are an AI assistant that helps people find information. | You will read a paragraph, and then issue queries to a search engine in order to fact-check it. ---PARAGRAPH John Smith is married to Lucy Smith. They have five kids, and he works as a software engineer at Microsoft. What search queries should I do to fact-check this? --- Now you will extract factual claims, and issue queries to fact-check them. When issuing a query, use the function SEARCH("query") FACTUAL CLAIMS |
QUERIES SEARCH("John Smith married Lucy Smith") SEARCH("John Smith family size") SEARCH("John Smith Microsoft software engineer") |
Notice the use of clear syntax to differentiate the sections and prime the output. In this simple example, breaking the task down from one to two steps is not very dramatic, but when trying to do this for a larger piece of text with many factual claims, breaking the task down can make a significant difference.
Sometimes we can get the model to use an affordance instead of relying on its own parameters for information and answers. Search, for example, can be an affordance to help mitigate against fabricated answers, and to get up-to-date information.
One simple way to use an affordance is to stop generation once the affordance calls are generated by the model, then paste the outcomes back into the prompt. Below is an example of what the follow-up call could be after doing the SEARCH calls above. Notice how we pasted the search results into the prompt and replaced the previous SEARCH calls.
System message | User | Assistant |
You are an AI assistant | You will read a paragraph, extract factual claims, and | - John Smith is married to Lucy Smith: Confirmed. Snippet [1] and [2] both mention John Smith's wife |
System User Assistant message |
that helps then use search engine as Lucy. people find results to fact-check them information. --- - John and Lucy have five kids: Confirmed. Snippet PARAGRAPH [3] mentions that John was accompanied by his John Smith is married to Lucy two daughters and three sons. Smith. They have five kids, and he works as a software - John works as a software engineer at Microsoft: engineer at Microsoft. What Partially confirmed. Snippet [4] mentions that John search queries should I do to spent 10 years at Microsoft, but it does not fact-check this? explicitly state that he currently works there. --- Further research may be needed to confirm his FACTUAL CLAIMS current employment status.
--- Here are snippets of the search results: SNIPPETS:
|
System User Assistant message |
have a daughter named Lucy --- Given the snippets, fact check each of the factual claims above: |
This is a variation on the break the task down technique. Instead of splitting a task into smaller steps, in this approach the model response is instructed to proceed step-by-step and present all the steps involved. Doing so reduces the possibility of inaccuracy of outcomes and makes assessing the model response easier.
System User Assistant message |
System message | User | Assistant |
You are an AI assistant that helps people find information. | Who was the most decorated (maximum medals) individual athlete in the Olympic games that were held at Sydney? Take a step-by-step approach in your response, cite sources and give reasoning before sharing final answer in the below format: ANSWER is: <name> | Step 1: Determine the year of the Sydney Olympic games. The Sydney Olympic games were held in the year 2000. Step 2: Find a list of individual athletes who won multiple medals in the 2000 Sydney Olympics. |
According to the official website of the Olympic Games, there were several individual athletes who won multiple medals in the 2000 Sydney Olympics. Some of the notable names include:
| ||
Step 3: Determine the athlete with the most medals. After reviewing the list of individual athletes who won multiple medals in the 2000 Sydney Olympics, we can see that the athlete with the most medals is Ian Thorpe from Australia, who won a total of 5 medals (3 gold and 2 silver). | ||
Therefore, the ANSWER is: Ian Thorpe. |
Using your prompt to specify the structure of the output can have a significant impact on the nature and quality of the results. Sometimes system message inputs as “only write true facts” or “do not fabricate information” might not be adequate mitigations. Instead, asking for the model response to also include citations can help result in a reduction of the prevalence of incorrect responses.
If you instruct the model to cite the source material when it makes statements, those statements are much more likely to be grounded. Asking for citations makes it so that
the model must make two errors every time it generates a response: the first error is the fabricated response, and the second is the bad citation. Note that the closer the citation is to the text it supports, the shorter the distance ahead the model needs to anticipate the citation, which suggests that inline citations are better mitigations for false content generation than citations at the end of the content.
Similarly, if you ask the model to extract factual statements from a paragraph, it may extract compound statements such as ‘X is doing Y AND Z’ (which may be harder to verify). This can be avoided by specifying an output structure such as (entity1, relationship, entity2).
The examples below show the use of citations and directing the model response to fit a defined structure.
System User Assistant message |
You are an List down the different behaviors exhibited by the Customer 1. Polite Greeting AI assistant Support agent in the following conversation. BEHAVIOR("Agent that helps --- starts the people find hi thank you for calling contoso pharmacy who am i speaking conversation with information with today i understand your frustration mary um can you tell a polite greeting me what exactly you're trying to accomplish OK uh certainly and asks for the happy to check that for you one moment please i see here that caller's name") you were on a generic form of lipitor is that right OK uh so i 2. Empathetic see that your doctor stopped prescribing these drugs in BEHAVIOR("Agent twenty twenty one actually OK uh yeah i'd be happy to check acknowledges that for you uh because sometimes there's a gap in the system Mary's frustration and it just doesn't get reported so let me take a look here just and tries to a moment so i'm seeing here that your doctor had these drugs understand her prescribed to you from twenty twelve through twenty twenty concern") one OK um yeah according to the latest records uh provided 3. Attentive by doctor's office you're now on a different drug metformin BEHAVIOR("Agent would you like us to go ahead and fill that prescription for you listens to the for pick up a bit of form in five hundred milligrams uh you're customer's very welcome mary please let us know if there's anything else problem and asks we can do for you today. for more details") --- 4. Proactive How was the response of the Customer Support agent? BEHAVIOR("Agent Classify and list down their various behaviors and cite the offers to check reason behind the classification, using the format the issue for the BEHAVIOR("reason") customer") 5. Informative BEHAVIOR("Agent provides the relevant information about the |
System User Assistant message |
customer's prescription history")
thanks the customer and offers further assistance if needed") |
System User Assistant message |
System message | User | Assistant |
You are an AI | Cluster the following news headlines into | |
assistant that | topic categories based on patterns seen | { |
helps people | within the text. Also mention reasoning | "ARTIFICIAL_INTELLIGENCE": { |
find | behind how these categories were defined. | "HEADLINES": [ |
information. | Output format: | "From books to presentations in |
{ | 10s with AR + ML", | |
"TOPIC_NAME": "", | "Demo from 1993 of 32-year-old | |
"HEADLINES": [], | Yann LeCun showing off the | |
"REASONING": "" | World's first Convolutional | |
} | Network for Text Recognition", | |
"First Order Motion Model | ||
Input news headlines: | applied to animate paintings" | |
1. "From books to presentations in 10s with | ], | |
AR + ML" | "REASONING": "These headlines | |
2. "Demo from 1993 of 32-year-old Yann | are related to artificial | |
LeCun showing off the World's first | intelligence, machine learning, | |
Convolutional Network for Text Recognition" | and their applications in various | |
3. "First Order Motion Model applied to | fields." | |
animate paintings" | }, | |
4. "Robinhood and other brokers literally | "FINANCIAL_MARKETS": { | |
blocking purchase of $GME, $NOK, $BB, | "HEADLINES": [ | |
$AMC; allow sells" | "Robinhood and other brokers | |
5. "United Airlines stock down over 5% | literally blocking purchase of | |
premarket trading" | $GME, $NOK, $BB, $AMC; allow | |
Output: | sells", | |
"United Airlines stock down over | ||
5% premarket trading", | ||
"Bitcoin was nearly $20,000 a | ||
year ago today" | ||
], | ||
"REASONING": "These headlines | ||
are related to financial markets, | ||
stocks, cryptocurrencies, and | ||
trading platforms." | ||
} | ||
} |
Changing the temperature parameter changes the output of the model. The temperature parameter can be set between 0 and 2. A higher value, for example 0.7 will make the output more random and produce more divergent responses, while a lower value, like. 0.2, will make the output more focused and concrete. A fictional story could be generated using a higher temperature. Whereas to generate a legal document it is
recommended to use a much lower temperature. Top_probability is another parameter and is similar to Temperature in that it also controls the randomness of the model response, but it does so in a different way. The general recommendation is to alter one of these two parameters at a time, not both.
One of the most effective ways to provide reliable answers is to give the model data to draw its responses from (grounding data). If your use case relies on up-to-date, reliable information and is not purely a creative scenario, we strongly recommend providing grounding data. In general, the closer you can get your source material to the final form of the answer you want, the less work the model needs to do, which means there is less opportunity for error. Here is an example where the system is provided a recent blog describing the launch of GPT-4 in Azure OpenAI Service , and asked to name some of its early customers.
System message | User | Assistant |
You are an AI assistant that helps | Who are some | Here are some of the early customers |
people find information. You will | of the early | of the Azure OpenAI service GPT-4 |
provide answers exclusively from | customers of | models: |
below text and respond in 100 words | the Azure | - Epic Healthcare: They plan to use |
or less: | OpenAI service | GPT-4 to help physicians and nurses |
"At Microsoft, we are constantly | GPT-4 models? | spend less time on data entry and |
discovering new ways to unleash | investigation. | |
creativity, unlock productivity, and | - Coursera: They are using Azure | |
uplevel skills so that more people can | OpenAI Service with GPT-4 to create a | |
benefit from using AI. This is allowing | new AI-powered learning experience | |
our customers to … | on their platform. | |
- Coca-Cola: They are using Azure | ||
… thereby reducing the likelihood that | OpenAI to build a knowledge hub and | |
the model will generate a harmful | plan to leverage GPT-4's multimodal | |
response.”1 | capabilities for marketing, advertising, | |
public relations, and customer | ||
Answer in 100 words or less. Use bullet | relations. | |
lists wherever possible. | - Other companies of all sizes are | |
using Azure AI to improve customer | ||
experiences, summarize long-form | ||
content, help write software, and | ||
reduce risk by predicting the right tax | ||
data. |
1 Text from Azure OpenAI GPT-4 launch blog .
Learn more about Azure OpenAI.
Get started with the ChatGPT model with the ChatGPT quickstart.
For more examples, check out the Azure OpenAI Samples GitHub repository
Article • 05/23/2023
This article provides a recommended framework and example templates to help write an effective system message, sometimes referred to as a metaprompt or system prompt that can be used to guide an AI system’s behavior and improve system performance. If you're new to prompt engineering, we recommend starting with our introduction to prompt engineering and prompt engineering techniques guidance.
This guide provides system message recommendations and resources that, along with other prompt engineering techniques, can help increase the accuracy and grounding of responses you generate with a Large Language Model (LLM). However, it is important to remember that even when using these templates and guidance, you still need to validate the responses the models generate. Just because a carefully crafted system message worked well for a particular scenario doesn't necessarily mean it will work more broadly across other scenarios. Understanding the limitations of LLMs and the mechanisms for evaluating and mitigating those limitations is just as important as understanding how to leverage their strengths.
The LLM system message framework described here covers four concepts:
Define the model’s profile, capabilities, and limitations for your scenario Define the model’s output format
Provide example(s) to demonstrate the intended behavior of the model Provide additional behavioral guardrails
Define the specific task(s) you would like the model to complete. Describe who the users of the model will be, what inputs they will provide to the model, and what you expect the model to do with the inputs.
Define how the model should complete the tasks, including any additional tools (like APIs, code, plug-ins) the model can use. If it doesn’t use additional tools, it can rely on its own parametric knowledge.
Define the scope and limitations of the model’s performance. Provide clear instructions on how the model should respond when faced with any limitations. For example, define how the model should respond if prompted on subjects or for uses that are off topic or otherwise outside of what you want the system to do.
Define the posture and tone the model should exhibit in its responses.
When using the system message to define the model’s desired output format in your scenario, consider and include the following types of information:
Define the language and syntax of the output format. If you want the output to be machine parse-able, you may want the output to be in formats like JSON, XSON or XML.
Define any styling or formatting preferences for better user or machine readability. For example, you may want relevant parts of the response to be bolded or citations to be in a specific format.
When using the system message to demonstrate the intended behavior of the model in your scenario, it is helpful to provide specific examples. When providing examples, consider the following:
Describe difficult use cases where the prompt is ambiguous or complicated, to give the model additional visibility into how to approach such cases.
Show the potential “inner monologue” and chain-of-thought reasoning to better inform the model on the steps it should take to achieve the desired outcomes.
When defining additional safety and behavioral guardrails, it’s helpful to first identify and prioritize the harms you’d like to address. Depending on the application, the sensitivity and severity of certain harms could be more important than others. Below, we’ve outlined some system message templates that may help mitigate some of the common harms that have been seen with LLMs, such as fabrication of content (that is not grounded or relevant), jailbreaks, and manipulation.
Learn more about Azure OpenAI
Learn more about deploying Azure OpenAI responsibly
For more examples, check out the Azure OpenAI Samples GitHub repository
Article • 03/08/2023 • 5 minutes to read
Use this article to get started with Azure OpenAI with step-by-step instructions to create a resource and deploy a model. While the steps for resource creation and model deployment can be completed in a few minutes, the actual deployment process itself can take more than hour. You can create your resource, start your deployment, and then check back in on your deployment later rather than actively waiting for the deployment to complete.
An Azure subscription - Create one for free
Access granted to Azure OpenAI in the desired Azure subscription
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
Resources in Azure can be created several different ways: Within the Azure portal
Using the REST APIs, Azure CLI, PowerShell or client libraries Via ARM templates
This guide walks you through the Azure portal creation experience.
Navigate to the create page: Azure OpenAI Service Create Page
On the Create page provide the following information:
Field Description |
Subscription Select the Azure subscription used in your OpenAI onboarding application |
Resource The Azure resource group that will contain your OpenAI resource. You can group create a new group or add it to a pre-existing group. |
Field Description |
Region The location of your instance. Different locations may introduce latency, but have no impact on the runtime availability of your resource. |
Name A descriptive name for your cognitive services resource. For example, MyOpenAIResource. |
Pricing Tier Only 1 pricing tier is available for the service currently |
Before you can generate text or inference, you need to deploy a model. You can select from one of several available models in Azure OpenAI Studio.
Davinci is the most capable model family and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results.
To deploy a model, follow these steps:
Sign in to Azure OpenAI Studio .
Select the subscription and OpenAI resource to work with.
Select Manage deployments in your resource > Go to Deployments under Manage your deployments and models. You might first need to scroll down on the landing page.
Select Create new deployment from the Management > Deployments page.
text-davinci-003
Select a model from the drop-down. For getting started in the East US region, we
text-davinci-002
recommend the the
model. In other regions you should start with model. Some models are not available in all regions. For a
list of available models per region, see Model Summary table and region availability.
Enter a model name to help you identify the model. Choose a name carefully. The model name will be used as the deployment name via OpenAI client libraries and API.
Select Create to deploy the model.
The deployments table displays a new entry that corresponds to this newly created model. Your deployment status will move to succeeded when the deployment is complete and ready for use.
Now that you have a resource and your first model deployed get started making API calls and generating text with our quickstarts.
Learn more about the underlying models that power Azure OpenAI.
Article • 05/15/2023
The ChatGPT and GPT-4 models are language models that are optimized for conversational interfaces. The models behave differently than the older GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT and GPT-4 models are conversation-in and message-out. The models expect input formatted in a specific chat-like transcript format, and return a completion that represents a model-written message in the chat. While this format was designed specifically for multi-turn conversations, you'll find it can also work well for non-chat scenarios too.
In Azure OpenAI there are two different options for interacting with these type of models:
Chat Completion API.
Completion API with Chat Markup Language (ChatML).
The Chat Completion API is a new dedicated API for interacting with the ChatGPT and GPT-4 models. This API is the preferred method for accessing these models. It is also the only way to access the new GPT-4 models.
ChatML uses the same completion API that you use for other models like text-davinci-002, it requires a unique token based prompt format known as Chat Markup Language (ChatML). This provides lower level access than the dedicated Chat Completion API, but also requires additional input validation, only supports ChatGPT (gpt-35-turbo) models, and the underlying format is more likely to change over time.
This article walks you through getting started with the new ChatGPT and GPT-4 models. It's important to use the techniques described here to get the best results. If you try to interact with the models the same way you did with the older model series, the models will often be verbose and provide less useful responses.
The following code snippet shows the most basic way to use the ChatGPT and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with our ChatGPT & GPT-4 Quickstart.
GPT-4 models are currently only available by request. Existing Azure OpenAI customers can apply for access by filling out this form .
Python |
import os import openai openai.api_type = "azure" openai.api_version = "2023-05-15" openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI resource's endpoint value. openai.api_key = os.getenv("OPENAI_API_KEY") response = openai.ChatCompletion.create( engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model. messages=[ {"role": "system", "content": "Assistant is a large language model trained by OpenAI."}, {"role": "user", "content": "Who were the founders of Microsoft?"} ] ) print(response) print(response['choices'][0]['message']['content']) |
Output
{ "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "The founders of Microsoft are Bill Gates and Paul Allen. They co-founded the company in 1975.", "role": "assistant" } } ], "created": 1679014551, "id": "chatcmpl-6usfn2yyjkbmESe3G4jaQR6bsScO1", "model": "gpt-3.5-turbo-0301", "object": "chat.completion", "usage": { "completion_tokens": 86, "prompt_tokens": 37, "total_tokens": 123 |
} } |
7 Note
The following parameters aren't available with the new ChatGPT and GPT-4 models:
logprobs , best_of , and echo . If you set any of these parameters, you'll get an error.
finish_reason
Every response includes a finish_reason . The possible values for are:
stop: API returned complete model output.
length: Incomplete model output due to max_tokens parameter or token limit.
content_filter: Omitted content due to a flag from our content filters.
null:API response still in progress or incomplete.
max_tokens
Consider setting to a slightly higher value than normal such as 300 or 500.
This ensures that the model doesn't stop generating text before it reaches the end of the message.
7 Note
gpt-35-turbo is equivalent to the gpt-3.5-turbo model from OpenAI.
gpt-35-turbo
gpt-
Unlike previous GPT-3 and GPT-3.5 models, the model as well as the
gpt-4-32k
4 and models will continue to be updated. When creating a deployment of
these models, you'll also need to specify a model version.
0301
Currently, only version
is available for ChatGPT and
for GPT-4 models. We'll
0314
continue to make updated versions available in the future. You can find model deprecation times on our models page.
OpenAI trained the ChatGPT and GPT-4 models to accept input formatted as a conversation. The messages parameter takes an array of dictionaries with a conversation organized by role.
The format of a basic Chat Completion is as follows:
{"role": "system", "content": "Provide some context and/or instructions to the model"}, {"role": "user", "content": "The users messages goes here"} |
A conversation with one example answer followed by a question would look like:
{"role": "system", "content": "Provide some context and/or instructions to the model."}, {"role": "user", "content": "Example question goes here."}, {"role": "assistant", "content": "Example answer goes here."}, {"role": "user", "content": "First question/message for the model to actually respond to."} |
System role
The system role also known as the system message is included at the beginning of the array. This message provides the initial instructions to the model. You can provide various information in the system role including:
A brief description of the assistant Personality traits of the assistant
Instructions or rules you would like the assistant to follow
Data or information needed for the model, such as relevant questions from an FAQ
You can customize the system role for your use case or just include basic instructions. The system role/message is optional, but it's recommended to at least include a basic one to get the best results.
Messages
After the system role, you can include a series of messages between the user and the
assistant.
{"role": "user", "content": "What is thermodynamics?"} |
To trigger a response from the model, you should end with a user message indicating that it's the assistant's turn to respond. You can also include a series of example messages between the user and the assistant as a way to do few shot learning.
Message prompt examples
The following section shows examples of different styles of prompts that you could use with the ChatGPT and GPT-4 models. These examples are just a starting point, and you can experiment with different prompts to customize the behavior for your own use cases.
Basic example
If you want the ChatGPT model to behave similarly to chat.openai.com , you can use a basic system message like "Assistant is a large language model trained by OpenAI."
{"role": "system", "content": "Assistant is a large language model trained by OpenAI."}, {"role": "user", "content": "Who were the founders of Microsoft?"} |
Example with instructions
For some scenarios, you may want to give additional instructions to the model to define guardrails for what the model is able to do.
{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions. Instructions:
sure" and recommend users go to the IRS website for more information. "}, {"role": "user", "content": "When are my taxes due?"} |
Using data for grounding
You can also include relevant data or information in the system message to give the model extra context for the conversation. If you only need to include a small amount of information, you can hard code it in the system message. If you have a large amount of
data that the model should be aware of, you can use embeddings or a product like Azure Cognitive Search to retrieve the most relevant information at query time.
{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only answer questions using the context below and if you're not sure of an answer, you can say 'I don't know'. Context:
principles that put people first. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use." }, {"role": "user", "content": "What is Azure OpenAI Service?"} |
Few shot learning with Chat Completion
You can also give few shot examples to the model. The approach for few shot learning has changed slightly because of the new prompt format. You can now include a series of messages between the user and the assistant in the prompt as few shot examples. These examples can be used to seed answers to common questions to prime the model or teach particular behaviors to the model.
This is only one example of how you can use few shot learning with ChatGPT and GPT-4. You can experiment with different approaches to see what works best for your use case.
{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions. "}, {"role": "user", "content": "When do I need to file my taxes by?"}, {"role": "assistant", "content": "In 2023, you will need to file your taxes by April 18th. The date falls after the usual April 15th deadline because {"role": "user", "content": "How can I check the status of my tax refund?"}, |
Using Chat Completion for non-chat scenarios
The Chat Completion API is designed to work with multi-turn conversations, but it also works well for non-chat scenarios.
For example, for an entity extraction scenario, you might use the following prompt:
{"role": "system", "content": "You are an assistant designed to extract entities from text. Users will paste in a string of text and you will respond with entities you've extracted from the text as a JSON object. Here's an example of your output format: { "name": "", "company": "", "phone_number": "" }"}, {"role": "user", "content": "Hello. My name is Robert Smith. I'm calling from Contoso Insurance, Delaware. My colleague mentioned that you are interested in learning about our comprehensive benefits policy. Could you give me a call back at (555) 346-9322 when you get a chance so we can go over the benefits?"} |
The examples so far have shown you the basic mechanics of interacting with the Chat Completion API. This example shows you how to create a conversation loop that performs the following actions:
Continuously takes console input, and properly formats it as part of the messages array as user role content.
Outputs responses that are printed to the console and formatted and added to the messages array as assistant role content.
This means that every time a new question is asked, a running transcript of the conversation so far is sent along with the latest question. Since the model has no memory, you need to send an updated transcript with each new question or the model will lose context of the previous questions and answers.
Python |
import os import openai openai.api_type = "azure" openai.api_version = "2023-05-15" openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI |
resource's endpoint value . openai.api_key = os.getenv("OPENAI_API_KEY") conversation=[{"role": "system", "content": "You are a helpful assistant."}] while(True): user_input = input() conversation.append({"role": "user", "content": user_input}) response = openai.ChatCompletion.create( engine="gpt-3.5-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model. messages = conversation ) conversation.append({"role": "assistant", "content": response['choices'] [0]['message']['content']}) print("\n" + response['choices'][0]['message']['content'] + "\n") |
When you run the code above you will get a blank console window. Enter your first question in the window and then hit enter. Once the response is returned, you can repeat the process and keep asking questions.
The previous example will run until you hit the model's token limit. With each question
messages
gpt-
gpt-4
gpt-4-32k
asked, and answer received, the array grows in size. The token limit for
35-turbo
is 4096 tokens, whereas the token limits for
and
are 8192 and
max_tokens
32768 respectively. These limits include the token count from both the message array sent and the model response. The number of tokens in the messages array combined
with the value of the an error.
parameter must stay under these limits or you'll receive
It's your responsibility to ensure the prompt and completion falls within the token limit. This means that for longer conversations, you need to keep track of the token count and only send the model a prompt that falls within the limit.
The following code sample shows a simple chat loop example with a technique for handling a 4096 token count using OpenAI's tiktoken library.
pip install tiktoken
The code requires tiktoken 0.3.0 . If you have an older version run
--upgrade .
Python
import tiktoken import openai import os openai.api_type = "azure" openai.api_version = "2023-05-15" openai.api_base = os.getenv("OPENAI_API_BASE") # Your Azure OpenAI resource's endpoint value . openai.api_key = os.getenv("OPENAI_API_KEY") system_message = {"role": "system", "content": "You are a helpful assistant."} max_response_tokens = 250 token_limit= 4096 conversation=[] conversation.append(system_message) def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"): encoding = tiktoken.encoding_for_model(model) num_tokens = 0 for message in messages: num_tokens += 4 # every message follows <im_start> {role/name}\n{content}<im_end>\n for key, value in message.items(): num_tokens += len(encoding.encode(value)) if key == "name": # if there's a name, the role is omitted num_tokens += -1 # role is always required and always 1 token num_tokens += 2 # every reply is primed with <im_start>assistant return num_tokens while(True): user_input = input("") conversation.append({"role": "user", "content": user_input}) conv_history_tokens = num_tokens_from_messages(conversation) while (conv_history_tokens+max_response_tokens >= token_limit): del conversation[1] conv_history_tokens = num_tokens_from_messages(conversation) response = openai.ChatCompletion.create( engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model. messages = conversation, temperature=.7, max_tokens=max_response_tokens, ) conversation.append({"role": "assistant", "content": response['choices'] [0]['message']['content']}) print("\n" + response['choices'][0]['message']['content'] + "\n") |
del
pop()
In this example once the token count is reached the oldest messages in the conversation
transcript will be removed.
is used instead of
for efficiency, and we start at
index 1 so as to always preserve the system message and only remove user/assistant messages. Over time, this method of managing the conversation can cause the conversation quality to degrade as the model will gradually lose context of the earlier portions of the conversation.
An alternative approach is to limit the conversation duration to the max token length or a certain number of turns. Once the max token limit is reached and the model would lose context if you were to allow the conversation to continue, you can prompt the user that they need to begin a new conversation and clear the messages array to start a brand new conversation with the full token limit available.
The token counting portion of the code demonstrated previously, is a simplified version of one of OpenAI's cookbook examples .
Learn more about Azure OpenAI.
Get started with the ChatGPT model with the ChatGPT quickstart.
For more examples, check out the Azure OpenAI Samples GitHub repository
Article • 02/17/2023 • 16 minutes to read
The completions endpoint can be used for a wide variety of tasks. It provides a simple but powerful text-in, text-out interface to any of our models. You input some text as a prompt, and the model will generate a text completion that attempts to match whatever context or pattern you gave it. For example, if you give the API the prompt, "As Descartes said, I think, therefore", it will return the completion " I am" with high probability.
The best way to start exploring completions is through our playground in Azure OpenAI Studio . It's a simple text box where you can submit a prompt to generate a completion. You can start with a simple example like the following:
write a tagline for an ice cream shop
once you submit, you'll see something like the following generated:
Console |
write a tagline for an ice cream shop we serve up smiles with every scoop! |
The actual completion results you see may differ because the API is stochastic by default. In other words, you might get a slightly different completion every time you call it, even if your prompt stays the same. You can control this behavior with the temperature setting.
This simple, "text in, text out" interface means you can "program" the model by providing instructions or just a few examples of what you'd like it to do. Its success generally depends on the complexity of the task and quality of your prompt. A general rule is to think about how you would write a word problem for a middle school student to solve. A well-written prompt provides enough information for the model to know what you want and how it should respond.
7 Note
Keep in mind that the models' training data cuts off in October 2019, so they may not have knowledge of current events. We plan to add more continuous training in the future.
Basics
OpenAI's models can do everything from generating original stories to performing complex text analysis. Because they can do so many things, you have to be explicit in showing what you want. Showing, not just telling, is often the secret to a good prompt.
The models try to predict what you want from the prompt. If you send the words "Give me a list of cat breeds," the model wouldn't automatically assume that you're asking for a list of cat breeds. You could as easily be asking the model to continue a conversation where the first words are "Give me a list of cat breeds" and the next ones are "and I'll tell you which ones I like." If the model only assumed that you wanted a list of cats, it wouldn't be as good at content creation, classification, or other tasks.
There are three basic guidelines to creating prompts:
Show and tell. Make it clear what you want either through instructions, examples, or a combination of the two. If you want the model to rank a list of items in alphabetical order or to classify a paragraph by sentiment, show it that's what you want.
Provide quality data. If you're trying to build a classifier or get the model to follow a pattern, make sure that there are enough examples. Be sure to proofread your examples
— the model is usually smart enough to see through basic spelling mistakes and give you a response, but it also might assume that the mistakes are intentional and it can affect the response.
Check your settings. The temperature and top_p settings control how deterministic the model is in generating a response. If you're asking it for a response where there's only one right answer, then you'd want to set these settings to lower values. If you're looking for a response that's not obvious, then you might want to set them to higher values. The number one mistake people use with these settings is assuming that they're "cleverness" or "creativity" controls.
Troubleshooting
If you're having trouble getting the API to perform as expected, follow this checklist:
Is it clear what the intended generation should be?
Are there enough examples?
Did you check your examples for mistakes? (The API won't tell you directly)
Are you using temp and top_p correctly?
To create a text classifier with the API we provide a description of the task and provide a few examples. In this demonstration we show the API how to classify the sentiment of Tweets.
Console |
This is a tweet sentiment classifier Tweet: "I loved the new Batman movie!" Sentiment: Positive Tweet: "I hate it when my phone battery dies." Sentiment: Negative Tweet: "My day has been 👍" Sentiment: Positive Tweet: "This is the link to the article" Sentiment: Neutral Tweet: "This new music video blew my mind" Sentiment: |
It's worth paying attention to several features in this example:
Use plain language to describe your inputs and outputs We use plain language for the input "Tweet" and the expected output "Sentiment." For best practices, start with plain language descriptions. While you can often use shorthand or keys to indicate the input and output, when building your prompt it's best to start by being as descriptive as possible and then working backwards removing extra words as long as the performance to the prompt is consistent.
Show the API how to respond to any case In this example we provide multiple outcomes "Positive", "Negative" and "Neutral." A neutral outcome is important because there will be many cases where even a human would have a hard time determining if something is positive or negative and situations where it's neither.
You can use text and emoji The classifier is a mix of text and emoji 👍. The API reads emoji and can even convert expressions to and from them.
You need fewer examples for familiar tasks For this classifier we only provided a handful of examples. This is because the API already has an understanding of sentiment
and the concept of a tweet. If you're building a classifier for something the API might not be familiar with, it might be necessary to provide more examples.
Improving the classifier's efficiency
Now that we have a grasp of how to build a classifier, let's take that example and make it even more efficient so that we can use it to get multiple results back from one API call.
This is a tweet sentiment classifier Tweet: "I loved the new Batman movie!" Sentiment: Positive Tweet: "I hate it when my phone battery dies" Sentiment: Negative Tweet: "My day has been 👍" Sentiment: Positive Tweet: "This is the link to the article" Sentiment: Neutral Tweet text Tweet sentiment ratings: 1: Positive 2: Negative 3: Positive 4: Neutral 5: Positive Tweet text Tweet sentiment ratings: 1. |
"I loved the new Batman movie!"
"I hate it when my phone battery dies"
"My day has been 👍"
"This is the link to the article"
"This new music video blew my mind"
"I can't stand homework"
"This sucks. I'm bored 😠"
"I can't wait for Halloween!!!"
"My cat is adorable ❤❤"
"I hate chocolate"
After showing the API how tweets are classified by sentiment we then provide it a list of tweets and then a list of sentiment ratings with the same number index. The API is able
to pick up from the first example how a tweet is supposed to be classified. In the second example it sees how to apply this to a list of tweets. This allows the API to rate five (and even more) tweets in just one API call.
It's important to note that when you ask the API to create lists or evaluate text you need to pay extra attention to your probability settings (Top P or Temperature) to avoid drift.
Make sure your probability setting is calibrated correctly by running multiple tests.
Don't make your list too long or the API is likely to drift.
One of the most powerful yet simplest tasks you can accomplish with the API is generating new ideas or versions of input. You can give the API a list of a few story ideas and it will try to add to that list. We've seen it create business plans, character descriptions and marketing slogans just by providing it a handful of examples. In this demonstration we'll use the API to create more examples for how to use virtual reality in the classroom:
Ideas involving education and virtual reality
|
2.
All we had to do in this example is provide the API with just a description of what the list
is about and one example. We then prompted the API with the number that it's a continuation of the list.
indicating
Although this is a very simple prompt, there are several details worth noting:
We explained the intent of the list
Just like with the classifier, we tell the API up front what the list is about. This helps it focus on completing the list and not trying to guess what the pattern is behind it.
Our example sets the pattern for the rest of the list
Because we provided a one-sentence description, the API is going to try to follow that pattern for the rest of the items it adds to the list. If we want a more verbose response, we need to set that up from the start.
We prompt the API by adding an incomplete entry
2.
When the API sees and the prompt abruptly ends, the first thing it tries to do is
figure out what should come after it. Since we already had an example with number one and gave the list a title, the most obvious response is to continue adding items to the list.
Advanced generation techniques
You can improve the quality of the responses by making a longer more diverse list in your prompt. One way to do that is to start off with one example, let the API generate more and select the ones that you like best and add them to the list. A few more high-quality variations can dramatically improve the quality of the responses.
The API is extremely adept at carrying on conversations with humans and even with itself. With just a few lines of instruction, we've seen the API perform as a customer service chatbot that intelligently answers questions without ever getting flustered or a wise-cracking conversation partner that makes jokes and puns. The key is to tell the API how it should behave and then provide a few examples.
Here's an example of the API playing the role of an AI answering questions:
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. Human: Hello, who are you? AI: I am an AI created by OpenAI. How can I help you today? Human: |
This is all it takes to create a chatbot capable of carrying on a conversation. But underneath its simplicity there are several things going on that are worth paying attention to:
We tell the API the intent but we also tell it how to behave Just like the other prompts, we cue the API into what the example represents, but we also add another key detail: we give it explicit instructions on how to interact with the phrase "The assistant is helpful, creative, clever, and very friendly."
Without that instruction the API might stray and mimic the human it's interacting with and become sarcastic or some other behavior we want to avoid.
We give the API an identity At the start we have the API respond as an AI that was created by OpenAI. While the API has no intrinsic identity, this helps it respond in a way that's as close to the truth as possible. You can use identity in other ways to create other kinds of chatbots. If you tell the API to respond as a woman who works as a research scientist in biology, you'll get intelligent and thoughtful comments from the API similar to what you'd expect from someone with that background.
In this example we create a chatbot that is a bit sarcastic and reluctantly answers questions:
Marv is a chatbot that reluctantly answers questions. ### User: How many pounds are in a kilogram? Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of this. ### User: What does HTML stand for? Marv: Was Google too busy? Hypertext Markup Language. The T is for try to ask better questions in the future. ### User: When did the first airplane fly? Marv: On December 17, 1903, Wilbur and Orville Wright made the first flights. I wish they'd come and take me away. ### User: Who was the first man in space? Marv: |
To create an amusing and somewhat helpful chatbot we provide a few examples of questions and answers showing the API how to reply. All it takes is just a few sarcastic responses and the API is able to pick up the pattern and provide an endless number of snarky responses.
The API is a language model that is familiar with a variety of ways that words and characters can be used to express information. This ranges from natural language text to code and languages other than English. The API is also able to understand content on a level that allows it to summarize, convert and express it in different ways.
Translation
In this example we show the API how to convert from English to French:
English: I do not speak French. French: Je ne parle pas français. English: See you later! French: À tout à l'heure! English: Where is a good restaurant? French: Où est un bon restaurant? English: What rooms do you have available? French: Quelles chambres avez-vous de disponible? English: |
This example works because the API already has a grasp of French, so there's no need to try to teach it this language. Instead, we just need to provide enough examples that API understands that it's converting from one language to another.
If you want to translate from English to a language the API is unfamiliar with you'd need to provide it with more examples and a fine-tuned model to do it fluently.
Conversion
In this example we convert the name of a movie into emoji. This shows the adaptability of the API to picking up patterns and working with other characters.
Back to Future: 👨👴🚗🕒 Batman: 🤵🦇 Transformers: 🚗🤖 Wonder Woman: Spider-Man: 🕸🕷🕸🕸🕷🕸 Winnie the Pooh: 🐻🐼🐻 The Godfather: 👨👩👧 ♂👲💥 Game of Thrones: 🏹🗡🗡🏹 Spider-Man: |
The API is able to grasp the context of text and rephrase it in different ways. In this example, the API takes a block of text and creates an explanation a child would understand. This illustrates that the API has a deep grasp of language.
My ten-year-old asked me what this passage means: """ |
A neutron star is the collapsed core of a massive supergiant star, which had a total mass of between 10 and 25 solar masses, possibly more if the star was especially metal-rich.[1] Neutron stars are the smallest and densest stellar objects, excluding black holes and hypothetical white holes, quark stars, and strange stars.[2] Neutron stars have a radius on the order of 10 kilometres (6.2 mi) and a mass of about 1.4 solar masses.[3] They result from the supernova explosion of a massive star, combined with gravitational collapse, that compresses the core past white dwarf star density to that of atomic nuclei. """ I rephrased it for him, in plain language a ten-year-old can understand: """ |
In this example we place whatever we want summarized between the triple quotes. It's worth noting that we explain both before and after the text to be summarized what our intent is and who the target audience is for the summary. This is to keep the API from drifting after it processes a large block of text.
While all prompts result in completions, it can be helpful to think of text completion as its own task in instances where you want the API to pick up where you left off. For example, if given this prompt, the API will continue the train of thought about vertical farming. You can lower the temperature setting to keep the API more focused on the intent of the prompt or increase it to let it go off on a tangent.
Vertical farming provides a novel solution for producing food locally, reducing transportation costs and |
This next prompt shows how you can use completion to help write React components. We send some code to the API, and it's able to continue the rest because it has an understanding of the React library. We recommend using models from our Codex series for tasks that involve understanding or generating code. Currently, we support two
code-davinci-002
Codex models: and code-cushman-001 . For more information about
Codex models, see the Codex models section in Models.
import React from 'react'; const HeaderComponent = () => ( |
The API has a lot of knowledge that it's learned from the data it was trained on. It also has the ability to provide responses that sound very real but are in fact made up. There are two ways to limit the likelihood of the API making up an answer.
Provide a ground truth for the API If you provide the API with a body of text to answer questions about (like a Wikipedia entry) it will be less likely to confabulate a response.
Use a low probability and show the API how to say "I don't know" If the API understands that in cases where it's less certain about a response that saying "I don't know" or some variation is appropriate, it will be less inclined to make up answers.
In this example we give the API examples of questions and answers it knows and then examples of things it wouldn't know and provide question marks. We also set the probability to zero so the API is more likely to respond with a "?" if there's any doubt.
Q: Who is Batman? A: Batman is a fictional comic book character. Q: What is torsalplexity? A: ? Q: What is Devz9? A: ? Q: Who is George Lucas? A: George Lucas is American film director and producer famous for creating Star Wars. Q: What is the capital of California? A: Sacramento. Q: What orbits the Earth? A: The Moon. Q: Who is Fred Rickerson? A: ? Q: What is an atom? A: An atom is a tiny particle that makes up everything. Q: Who is Alvan Muntz? A: ? Q: What is Kozar-09? A: ? |
Q: How many moons does Mars have? A: Two, Phobos and Deimos. Q: |
The Codex model series is a descendant of OpenAI's base GPT-3 series that's been trained on both natural language and billions of lines of code. It's most capable in Python and proficient in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.
Learn more about generating code completions, with the working with code guide
Learn how to work with code (Codex). Learn more about the underlying models that power Azure OpenAI.
Article • 05/26/2023
While OpenAI and Azure OpenAI Service rely on a common Python client library are small changes you need to make to your code in order to swap back and forth
, there
between endpoints. This article walks you through the common changes and differences you'll experience when working across OpenAI and Azure OpenAI.
7 Note
This library is maintained by OpenAI and is currently in preview. Refer to the release
library.
or the version.py commit history
to track the latest updates to the
We recommend using environment variables. If you haven't done this before our Python quickstarts walk you through this configuration.
API key
OpenAI | Azure OpenAI |
Python | Python |
import openai | import openai |
openai.api_key = | openai.api_type = "azure" |
"sk-..." | openai.api_key = "..." |
openai.organization | openai.api_base = "https://example- |
= "..." | endpoint.openai.azure.com" |
openai.api_version = "2023-05-15" # subject to | |
change |
Azure Active Directory authentication
OpenAI Azure OpenAI |
Python Python import openai import openai from azure.identity import DefaultAzureCredential openai.api_key = "sk-..." credential = DefaultAzureCredential() openai.organizat token = ion = "..." credential.get_token("https://cognitiveservices.az ure.com/.default") openai.api_type = "azuread" openai.api_key = token.token openai.api_base = "https://example-endpoint.openai.azure.com" openai.api_version = "2023-05-15" # subject to change |
model
OpenAI uses the keyword argument to specify what model to use. Azure OpenAI
deployment_id
has the concept of deployments and uses the keyword argument to
engine
model
describe which model deployment to use. Azure OpenAI also supports the use of interchangeably with deployment_id .
engine
For OpenAI preferred.
still works in most instances, but it's deprecated and is
OpenAI | Azure OpenAI |
Python | Python |
completion = | completion = |
openai.Completion.create( | openai.Completion.create( |
prompt="<prompt>", | prompt="<prompt>", |
model="text-davinci-003" | deployment_id="text-davinci- |
) | 003" |
#engine="text-davinci-003" | |
chat_completion = | ) |
openai.ChatCompletion.create( | |
messages="<messages>", | chat_completion = |
model="gpt-4" | openai.ChatCompletion.create( |
) | messages="<messages>", |
deployment_id="gpt-4" | |
embedding = | #engine="gpt-4" |
openai.Embedding.create( | |
input="<input>", | ) |
model="text-embedding-ada-002" |
) embedding = openai.Embedding.create( input="<input>", deployment_id="text-embedding-ada-002" #engine="text-embedding-ada-002" ) |
Many examples show passing multiple inputs into the embeddings API. For Azure OpenAI, currently we must pass a single text input per call.
OpenAI | Azure OpenAI |
Python | Python |
inputs = ["A", "B", "C"] | inputs = ["A", "B", "C"] |
embedding = | for text in inputs: |
openai.Embedding.create( | embedding = |
input=inputs, | openai.Embedding.create( |
model="text-embedding-ada- | input=text, |
002" | deployment_id="text- |
) | embedding-ada-002" |
#engine="text-embedding-ada- | |
002" | |
) |
Learn more about how to work with ChatGPT and the GPT-4 models with our how-to guide.
For more examples, check out the Azure OpenAI Samples GitHub repository
Article • 02/17/2023 • 10 minutes to read
The Codex model series is a descendant of our GPT-3 series that's been trained on both natural language and billions of lines of code. It's most capable in Python and proficient in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.
You can use Codex for a variety of tasks including:
Turn comments into code
Complete your next line or function in context
Bring knowledge to you, such as finding a useful library or API call for an application
Add comments
Rewrite code for efficiency
Here are a few examples of using Codex that can be tested in Azure OpenAI Studio's playground with a deployment of a Codex series model, such as code-davinci-002 .
Saying "Hello" (Python)
Python |
""" Ask the user for their name and say "Hello" """ |
Create random names (Python)
Python |
"""
|
Create a MySQL query (Python)
Python |
""" Table customers, columns = [CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalCode, Phone, Fax, Email, SupportRepId] Create a MySQL query for all customers in Texas named Jane """ query = |
Explaining code (JavaScript)
JavaScript |
// Function 1 var fullNames = []; for (var i = 0; i < 50; i++) { fullNames.push(names[Math.floor(Math.random() * names.length)] + " " + lastNames[Math.floor(Math.random() * lastNames.length)]); } // What does Function 1 do? |
Start with a comment, data or code
You can experiment using one of the Codex models in our playground (styling instructions as comments when needed.)
To get Codex to create a useful completion, it's helpful to think about what information a programmer would need to perform a task. This could simply be a clear comment or the data needed to write a useful function, like the names of variables or what class a function handles.
In this example we tell Codex what to call the function and what task it's going to perform.
Python |
# Create a function called 'nameImporter' to add a first and last name to the database |
This approach scales even to the point where you can provide Codex with a comment and an example of a database schema to get it to write useful query requests for various databases. Here's an example where we provide the columns and table names for the query.
Python |
# Table albums, columns = [AlbumId, Title, ArtistId] # Table artists, columns = [ArtistId, Name] # Table media_types, columns = [MediaTypeId, Name] # Table playlists, columns = [PlaylistId, Name] # Table playlist_track, columns = [PlaylistId, TrackId] # Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice] # Create a query for all albums with more than 10 tracks |
When you show Codex the database schema, it's able to make an informed guess about how to format a query.
Specify the programming language
Codex understands dozens of different programming languages. Many share similar conventions for comments, functions and other programming syntax. By specifying the language and what version in a comment, Codex is better able to provide a completion for what you want. That said, Codex is fairly flexible with style and syntax. Here's an example for R and Python.
R |
# R language # Calculate the mean distance between an array of points |
Python |
# Python 3 # Calculate the mean distance between an array of points |
Prompt Codex with what you want it to do
If you want Codex to create a webpage, placing the first line of code in an HTML document ( <!DOCTYPE html> ) after your comment tells Codex what it should do next.
The same method works for creating a function from a comment (following the comment with a new line starting with func or def).
HTML |
<!-- Create a web page with the title 'Kat Katman attorney at paw' --> <!DOCTYPE html> |
<!DOCTYPE html>
Placing it to do.
after our comment makes it very clear to Codex what we want
Or if we want to write a function we could start the prompt as follows and Codex will understand what it needs to do next.
Python |
# Create a function to count to 100 def counter |
Specifying libraries will help Codex understand what you want
Codex is aware of a large number of libraries, APIs and modules. By telling Codex which ones to use, either from a comment or importing them into your code, Codex will make suggestions based upon them instead of alternatives.
HTML |
<!-- Use A-Frame version 1.2.0 to create a 3D website --> <!-- https://aframe.io/releases/1.2.0/aframe.min.js --> |
By specifying the version, you can make sure Codex uses the most current library.
7 Note
Codex can suggest helpful libraries and APIs, but always be sure to do your own research to make sure that they're safe for your application.
Comment style can affect code quality
With some languages, the style of comments can improve the quality of the output. For example, when working with Python, in some cases using doc strings (comments wrapped in triple quotes) can give higher quality results than using the pound ( # ) symbol.
Python |
""" Create an array of users and email addresses """ |
Comments inside of functions can be helpful
Recommended coding standards usually suggest placing the description of a function inside the function. Using this format helps Codex more clearly understand what you want the function to do.
Python |
def getUserBalance(id): """ Look up the user in the database ‘UserData' and return their current account balance. """ |
Provide examples for more precise results
If you have a particular style or format you need Codex to use, providing examples or demonstrating it in the first part of the request will help Codex more accurately match what you need.
Python |
""" Create a list of random animals and species """ animals = [ {"name": "Chomper", "species": "Hamster"}, {"name": |
Lower temperatures give more precise results
Setting the API temperature to 0, or close to zero (such as 0.1 or 0.2) tends to give better results in most cases. Unlike GPT-3 models, where a higher temperature can
provide useful creative and random results, higher temperatures with Codex models may give you really random or erratic responses.
In cases where you need Codex to provide different potential results, start at zero and then increment upwards by 0.1 until you find suitable variation.
Organize tasks into functions
We can get Codex to write functions by specifying what the function should do in as precise terms as possible in comment. For example, by writing the following comment, Codex creates a JavaScript timer function that's triggered when a user presses a button:
A simple JavaScript timer
JavaScript |
// Create a timer that creates an alert in 10 seconds |
Creating example data
Testing applications often requires using example data. Because Codex is a language model that understands how to comprehend and write natural language, you can ask Codex to create data like arrays of made up names, products and other variables. For example, here we ask Codex to create an array of weather temperatures.
JavaScript |
/* Create an array of weather temperatures for San Francisco */ |
Asking Codex to perform this task will produce a table like this:
JavaScript |
var weather = [ { month: 'January', high: 58, low: 48 }, { month: 'February', high: 61, low: 50 }, { month: 'March', high: 64, low: 53 }, { month: 'April', high: 67, low: 55 }, { month: 'May', high: 70, low: 58 }, { month: 'June', high: 73, low: 61 }, { month: 'July', high: 76, low: 63 }, { month: 'August', high: 77, low: 64 }, { month: 'September', high: 76, low: 63 }, { month: 'October', high: 73, low: 61 }, { month: 'November', high: 68, low: 57 }, |
{ month: 'December', high: 64, low: 54 } ]; |
Compound functions and small applications
We can provide Codex with a comment consisting of a complex request like creating a random name generator or performing tasks with user input and Codex can generate the rest provided there are enough tokens.
JavaScript |
/* Create a list of animals Create a list of cities Use the lists to generate stories about what I saw at the zoo in each city */ |
Limit completion size for more precise results or lower latency
Requesting longer completions in Codex can lead to imprecise answers and repetition. Limit the size of the query by reducing max_tokens and setting stop tokens. For
\n
instance, add as a stop sequence to limit completions to one line of code. Smaller
completions also incur less latency.
Use streaming to reduce latency
Large Codex queries can take tens of seconds to complete. To build applications that require lower latency, such as coding assistants that perform autocompletion, consider using streaming. Responses will be returned before the model finishes generating the entire completion. Applications that need only part of a completion can reduce latency by cutting off a completion either programmatically or by using creative values for stop .
Users can combine streaming with duplication to reduce latency by requesting more than one solution from the API, and using the first response returned. Do this by setting n > 1 . This approach consumes more token quota, so use carefully (for example, by
max_tokens
using reasonable settings for and stop ).
Use Codex to explain code
Codex's ability to create and understand code allows us to use it to perform tasks like explaining what the code in a file does. One way to accomplish this is by putting a comment after a function that starts with "This function" or "This application is." Codex will usually interpret this as the start of an explanation and complete the rest of the text.
JavaScript |
/* Explain what the previous function is doing: It |
Explaining an SQL query
In this example, we use Codex to explain in a human readable format what an SQL query is doing.
SQL |
SELECT DISTINCT department.name FROM department JOIN employee ON department.id = employee.department_id JOIN salary_payments ON employee.id = salary_payments.employee_id WHERE salary_payments.date BETWEEN '2020-06-01' AND '2020-06-30' GROUP BY department.name HAVING COUNT(employee.id) > 10; -- Explanation of the above query in human readable format -- |
Writing unit tests
Creating a unit test can be accomplished in Python simply by adding the comment "Unit test" and starting a function.
Python |
# Python 3 def sum_numbers(a, b): return a + b # Unit test def |
Checking code for errors
By using examples, you can show Codex how to identify errors in code. In some cases no examples are required, however demonstrating the level and detail to provide a
description can help Codex understand what to look for and how to explain it. (A check by Codex for errors shouldn't replace careful review by the user. )
JavaScript |
/* Explain why the previous function doesn't work. */ |
Using source data to write database functions
Just as a human programmer would benefit from understanding the database structure and the column names, Codex can use this data to help you write accurate query requests. In this example, we insert the schema for a database and tell Codex what to query the database for.
Python |
# Table albums, columns = [AlbumId, Title, ArtistId] # Table artists, columns = [ArtistId, Name] # Table media_types, columns = [MediaTypeId, Name] # Table playlists, columns = [PlaylistId, Name] # Table playlist_track, columns = [PlaylistId, TrackId] # Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice] # Create a query for all albums with more than 10 tracks |
Converting between languages
You can get Codex to convert from one language to another by following a simple format where you list the language of the code you want to convert in a comment, followed by the code and then a comment with the language you want it translated into.
Python | |
# | Convert this from Python to R |
# | Python version |
[ | Python code ] |
# | End |
# | R version |
Rewriting code for a library or framework
If you want Codex to make a function more efficient, you can provide it with the code to rewrite followed by an instruction on what format to use.
JavaScript |
// Rewrite this as a React component var input = document.createElement('input'); input.setAttribute('type', 'text'); document.body.appendChild(input); var button = document.createElement('button'); button.innerHTML = 'Say Hello'; document.body.appendChild(button); button.onclick = function() { var name = input.value; var hello = document.createElement('div'); hello.innerHTML = 'Hello ' + name; document.body.appendChild(hello); }; // React version: |
Learn more about the underlying models that power Azure OpenAI.
Documentation
How to customize a model with Azure OpenAI - Azure OpenAI
Learn how to create your own customized model with Azure OpenAI
How to generate text with Azure OpenAI - Azure OpenAI
Learn how to generate or manipulate text, including code with Azure OpenAI
How to prepare a dataset for custom model training - Azure OpenAI
Learn how to prepare your dataset for fine-tuning
Azure OpenAI models - Azure OpenAI
Learn about the different models that are available in Azure OpenAI.
Quickstart - Deploy a model and generate text using Azure OpenAI - Azure OpenAI
Walkthrough on how to get started with Azure OpenAI and make your first completions call.
Azure OpenAI embeddings tutorial - Azure OpenAI
Learn how to use the Azure OpenAI embeddings API for document search with the BillSum dataset
Azure OpenAI REST API reference - Azure OpenAI
Learn how to use the Azure OpenAI REST API. In this article, you'll learn about authorization options, how to structure a request and receive a response.
Azure OpenAI content filtering - Azure OpenAI
Learn about the content filtering capabilities of the OpenAI service in Azure Cognitive Services
Show 5 more
Article • 04/20/2023
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.
To obtain an embedding vector for a piece of text, we make a request to the embeddings endpoint as shown in the following code snippets:
console
curl
https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPL OYMENT_NAME/embeddings?api-version=2022-12-01\
-H 'Content-Type: application/json' \
-H 'api-key: YOUR_API_KEY' \
-d '{"input": "Sample Document goes here"}'
Console
Verify inputs don't exceed the maximum length
The maximum length of input text for our embedding models is 2048 tokens (equivalent to around 2-3 pages of text). You should verify that your inputs don't exceed this limit before making a request.
Choose the best model for your task
<search_model>-
doc
For the search models, you can obtain embeddings in two ways. The model is used for longer pieces of text (to be searched over) and the
<search_model>-query
model is used for shorter pieces of text, typically queries or class
labels in zero shot classification. You can read more about all of the Embeddings models in our Models guide.
Replace newlines with a single space
Unless you're embedding code, we suggest replacing newlines (\n) in your input with a single space, as we have observed inferior results when newlines are present.
Our embedding models may be unreliable or pose social risks in certain cases, and may cause harm in the absence of mitigations. Review our Responsible AI content for more information on how to approach their use responsibly.
Learn more about using Azure OpenAI and embeddings to perform document search with our embeddings tutorial.
Learn more about the underlying models that power Azure OpenAI.
Article • 03/14/2023 • 11 minutes to read
The first step of customizing your model is to prepare a high quality dataset. To do this you'll need a set of training examples composed of single input prompts and the associated desired output ('completion'). This format is notably different than using models during inference in the following ways:
Only provide a single prompt vs a few examples.
You don't need to provide detailed instructions as part of the prompt.
Each prompt should end with a fixed separator to inform the model when the prompt ends and the completion begins. A simple separator, which generally works well is \n\n###\n\n . The separator shouldn't appear elsewhere in any prompt.
Each completion should start with a whitespace due to our tokenization, which tokenizes most words with a preceding whitespace.
Each completion should end with a fixed stop sequence to inform the model when the completion ends. A stop sequence could be \n , ### , or any other token that doesn't appear in any completion.
For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. Also specify the same stop sequence to properly truncate the completion.
The dataset cannot exceed 100 MB in total file size.
Customization performs better with high-quality examples and the more you have, generally the better the model performs. We recommend that you provide at least a few hundred high-quality examples to achieve a model that performs better than using well-designed prompts with a base model. From there, performance tends to linearly increase with every doubling of the number of examples. Increasing the number of examples is usually the best and most reliable way of improving performance.
If you're fine-tuning on a pre-existing dataset rather than writing prompts from scratch, be sure to manually review your data for offensive or inaccurate content if possible, or review as many random samples of the dataset as possible if it's large.
Fine-tuning can solve various problems, and the optimal way to use it may depend on your specific use case. Below, we've listed the most common use cases for fine-tuning and corresponding guidelines.
Classification
Classifiers are the easiest models to get started with. For classification problems we suggest using ada, which generally tends to perform only very slightly worse than more capable models once fine-tuned, while being significantly faster. In classification problems, each prompt in the dataset should be classified into one of the predefined classes. For this type of problem, we recommend:
Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to also append this separator when you eventually make requests to your model.
Choose classes that map to a single token. At inference time, specify
max_tokens=1 since you only need the first token for classification.
Ensure that the prompt + completion doesn't exceed 2048 tokens, including the separator
Aim for at least 100 examples per class
To get class log probabilities, you can specify logprobs=5 (for five classes) when using your model
Ensure that the dataset used for fine-tuning is very similar in structure and type of task as what the model will be used for
Case study: Is the model making untrue statements?
Let's say you'd like to ensure that the text of the ads on your website mention the correct product and company. In other words, you want to ensure the model isn't making things up. You may want to fine-tune a classifier which filters out incorrect ads.
The dataset might look something like the following:
JSON |
{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"} {"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"} |
In the example above, we used a structured input containing the name of the company,
\nSupported:
the product, and the associated ad. As a separator we used which clearly
separated the prompt from the completion. With a sufficient number of examples, the separator you choose doesn't make much of a difference (usually less than 0.4%) as long as it doesn't appear within the prompt or the completion.
For this use case we fine-tuned an ada model since it is faster and cheaper, and the performance is comparable to larger models because it's a classification task.
Now we can query our model by making a Completion request.
Console |
curl https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN T_NAME/completions?api-version=2022-12-01\ \ -H 'Content-Type: application/json' \ -H 'api-key: YOUR_API_KEY' \ -d '{ "prompt": "Company: Reliable accountants Ltd\nProduct: Personal Tax help\nAd:Best advice in town!\nSupported:", "max_tokens": 1 }' |
yes
Which will return either or no .
Case study: Sentiment analysis
Let's say you'd like to get a degree to which a particular tweet is positive or negative. The dataset might look something like the following:
Console |
{"prompt":"Overjoyed with the new iPhone! ->", "completion":" positive"} {"prompt":"@contoso_basketball disappoint for a third straight night. ->", "completion":" negative"} |
Once the model is fine-tuned, you can get back the log probabilities for the first
logprobs=2
completion token by setting on the completion request. The higher the
Console
probability for positive class, the higher the relative sentiment. Now we can query our model by making a Completion request.
curl https://YOUR_RESOURCE_NAME.openaiazure.com/openai/deployments/YOUR_DEPLOYMEN T_NAME/completions?api-version=2022-12-01\ \ -H 'Content-Type: application/json' \ -H 'api-key: YOUR_API_KEY' \ -d '{ "prompt": "Excited to share my latest blog post! ->", "max_tokens": 1, "logprobs": 2 }' |
Which will return:
JSON | |
{ | |
"object": "text_completion", | |
"created": 1589498378, | |
"model": "YOUR_FINE_TUNED_MODEL_NAME", | |
"choices": [ | |
{ | |
"logprobs": { | |
"text_offset": [ | |
19 | |
], | |
"token_logprobs": [ | |
-0.03597255 | |
], | |
"tokens": [ | |
" positive" | |
], | |
"top_logprobs": [ | |
{ | |
" negative": -4.9785037, | |
" positive": -0.03597255 | |
} | |
] | |
}, | |
"text": " positive", | |
"index": 0, | |
"finish_reason": "length" | |
} | |
] | |
} | |
Case study: Categorization for Email triage
Let's say you'd like to categorize incoming email into one of a large number of predefined categories. For classification into a large number of categories, we recommend you convert those categories into numbers, which will work well with up to approximately 500 categories. We've observed that adding a space before the number sometimes slightly helps the performance, due to tokenization. You may want to structure your training data as follows:
JSON |
{ "prompt":"Subject: <email_subject>\nFrom:<customer_name>\nDate: <date>\nContent:<email_body>\n\n###\n\n", "completion":" <numerical_category>" } |
For example:
JSON |
{ "prompt":"Subject: Update my address\nFrom:Joe Doe\nTo:support@ourcompany.com\nDate:2021-06-03\nContent:Hi,\nI would like to update my billing address to match my delivery address.\n\nPlease let me know once done.\n\nThanks,\nJoe\n\n###\n\n", "completion":" 4" } |
\n\n###\n\n
In the example above we used an incoming email capped at 2043 tokens as input. (This allows for a four token separator and a one token completion, summing up to 2048.) As
a separator we used email.
and we removed any occurrence of ### within the
Conditional generation
Conditional generation is a problem where the content needs to be generated given some kind of input. This includes paraphrasing, summarizing, entity extraction, product description writing given specifications, chatbots and many others. For this type of problem we recommend:
Use a separator at the end of the prompt, for example, \n\n###\n\n . Remember to also append this separator when you eventually make requests to your model.
Use an ending token at the end of the completion, for example, END .
Remember to add the ending token as a stop sequence during inference, for example, stop=[" END"] .
Aim for at least ~500 examples.
Ensure that the prompt + completion doesn't exceed 2048 tokens, including the separator.
Ensure the examples are of high quality and follow the same desired format. Ensure that the dataset used for fine-tuning is similar in structure and type of task as what the model will be used for.
Using Lower learning rate and only 1-2 epochs tends to work better for these use cases.
Case study: Write an engaging ad based on a Wikipedia article
This is a generative use case so you would want to ensure that the samples you provide are of the highest quality, as the fine-tuned model will try to imitate the style (and mistakes) of the given examples. A good starting point is around 500 examples. A sample dataset might look like this:
JSON |
{ "prompt":"<Product Name>\n<Wikipedia description>\n\n###\n\n", "completion":" <engaging ad> END" } |
For example:
JSON |
{ "prompt":"Samsung Galaxy Feel\nThe Samsung Galaxy Feel is an Android smartphone developed by Samsung Electronics exclusively for the Japanese market. The phone was released in June 2017 and was sold by NTT Docomo. It runs on Android 7.0 (Nougat), has a 4.7 inch display, and a 3000 mAh battery.\nSoftware\nSamsung Galaxy Feel runs on Android 7.0 (Nougat), but can be later updated to Android 8.0 (Oreo).\nHardware\nSamsung Galaxy Feel has a 4.7 inch Super AMOLED HD display, 16 MP back facing and 5 MP front facing cameras. It has a 3000 mAh battery, a 1.6 GHz Octa-Core ARM Cortex- A53 CPU, and an ARM Mali-T830 MP1 700 MHz GPU. It comes with 32GB of internal storage, expandable to 256GB via microSD. Aside from its software and hardware specifications, Samsung also introduced a unique a hole in the phone's shell to accommodate the Japanese perceived penchant for personalizing their mobile phones. The Galaxy Feel's battery was also touted as a major selling point since the market favors handsets with longer battery life. The device is also waterproof and supports 1seg digital broadcasts using an antenna that is sold separately.\n\n###\n\n", "completion":"Looking for a smartphone that can do it all? Look no further than Samsung Galaxy Feel! With a slim and sleek design, our latest smartphone features high-quality picture and video capabilities, as well as |
an award winning battery life. END" } |
Here we used a multiline separator, as Wikipedia articles contain multiple paragraphs and headings. We also used a simple end token, to ensure that the model knows when the completion should finish.
Case study: Entity extraction
This is similar to a language transformation task. To improve the performance, it's best to either sort different extracted entities alphabetically or in the same order as they appear in the original text. This helps the model to keep track of all the entities which need to be generated in order. The dataset could look as follows:
JSON |
{ "prompt":"<any text, for example news article>\n\n###\n\n", "completion":" <list of entities, separated by a newline> END" } |
For example:
JSON |
{ "prompt":"Portugal will be removed from the UK's green travel list from Tuesday, amid rising coronavirus cases and concern over a \"Nepal mutation of the so-called Indian variant\". It will join the amber list, meaning holidaymakers should not visit and returnees must isolate for 10 days...\n\n###\n\n", "completion":" Portugal\nUK\nNepal mutation\nIndian variant END" } |
A multi-line separator works best, as the text will likely contain multiple lines. Ideally there will be a high diversity of the types of input prompts (news articles, Wikipedia pages, tweets, legal documents), which reflect the likely texts which will be encountered when extracting entities.
Case study: Customer support chatbot
A chatbot will normally contain relevant context about the conversation (order details), summary of the conversation so far, and most recent messages. For this use case the same past conversation can generate multiple rows in the dataset, each time with a
slightly different context, for every agent generation as a completion. This use case requires a few thousand examples, as it likely deals with different types of requests, and customer issues. To ensure the performance is of high quality, we recommend vetting the conversation samples to ensure the quality of agent messages. The summary can be generated with a separate text transformation fine tuned model. The dataset could look as follows:
JSON |
{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2>\n"} {"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:", "completion":" <response3>\n"} |
\n
Here we purposefully separated different types of input information, but maintained Customer Agent dialog in the same format between a prompt and a completion. All the
completions should only be by the agent, and we can use doing inference.
as a stop sequence when
Case study: Product description based on a technical list of properties
Here it's important to convert the input data into a natural language, which will likely lead to superior performance. For example, the following format:
JSON |
{ "prompt":"Item=handbag, Color=army_green, price=$99, size=S->", "completion":"This stylish small green handbag will add a unique touch to your look, without costing you a fortune." } |
Won't work as well as:
JSON |
{ "prompt":"Item is a handbag. Colour is army green. Price is midrange. Size is small.->", "completion":"This stylish small green handbag will add a unique touch |
to your look, without costing you a fortune." } |
For high performance, ensure that the completions were based on the description provided. If external content is often consulted, then adding such content in an automated way would improve the performance. If the description is based on images, it may help to use an algorithm to extract a textual description of the image. Since completions are only one sentence long, we can use . as the stop sequence during inference.
Open ended generation
For this type of problem we recommend: Leave the prompt empty.
No need for any separators.
You'll normally want a large number of examples, at least a few thousand. Ensure the examples cover the intended domain or the desired tone of voice.
Case study: Maintaining company voice
Many companies have a large amount of high quality content generated in a specific voice. Ideally all generations from our API should follow that voice for the different use cases. Here we can use the trick of leaving the prompt empty, and feeding in all the documents which are good examples of the company voice. A fine-tuned model can be used to solve many different use cases with similar prompts to the ones used for base models, but the outputs are going to follow the company voice much more closely than previously.
JSON |
{"prompt":"", "completion":" <company voice textual content>"} {"prompt":"", "completion":" <company voice textual content2>"} |
A similar technique could be used for creating a virtual character with a particular personality, style of speech and topics the character talks about.
Generative tasks have a potential to leak training data when requesting completions from the model, so extra care needs to be taken that this is addressed appropriately. For example personal or sensitive company information should be replaced by generic information or not be included into fine-tuning in the first place.
Fine tune your model with our How-to guide
Learn more about the underlying models that power Azure OpenAI Service
Article • 04/05/2023 • 51 minutes to read
Azure OpenAI Service lets you tailor our models to your personal datasets using a process known as fine-tuning. This customization step will let you get more out of the service by providing:
Higher quality results than what you can get just from prompt design The ability to train on more examples than can fit into a prompt Lower-latency requests
A customized model improves on the few-shot learning approach by training the model's weights on your specific prompts and structure. The customized model lets you achieve better results on a wider number of tasks without needing to provide examples in your prompt. The result is less text sent and fewer tokens processed on every API call, saving cost and improving request latency.
7 Note
There is a breaking change in the create fine tunes command in the latest 12-01-2022 GA API. For the latest command syntax consult the reference documentation
An Azure subscription - Create one for free
Access granted to Azure OpenAI in the desired Azure subscription
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
An Azure OpenAI resource
For more information about creating a resource, see Create a resource and deploy a model using Azure OpenAI.
The fine-tuning workflow in Azure OpenAI Studio requires the following steps:
Prepare your training and validation data
Use the Create customized model wizard in Azure OpenAI Studio to train your customized model
Select a base model
Choose your training data
Optionally, choose your validation data
Optionally, choose advanced options for your fine-tune job
Review your choices and train your new customized model
Check the status of your customized model
Deploy your customized model for use
Use your customized model
Optionally, analyze your customized model for performance and fit
Your training data and validation data sets consist of input & output examples for how you would like the model to perform.
The training and validation data you use must be formatted as a JSON Lines (JSONL) document in which each line represents a single prompt-completion pair. The OpenAI command-line interface (CLI) includes a data preparation tool that validates, gives suggestions, and reformats your training data into a JSONL file ready for fine-tuning.
Here's an example of the training data format:
JSON |
{"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} |
In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM), and the file must be less than 200 MB in size. For more information about formatting your training data, see Learn how to prepare your dataset for fine-tuning.
Creating your training and validation datasets
Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of our GPT-3 base models. Prompts for completion calls
often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, we recommend that each training example consists of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.
The more training examples you have, the better. We recommend having at least 200 training examples. In general, we've found that each doubling of the dataset size leads to a linear increase in model quality.
For more information about preparing training data for various tasks, see Learn how to prepare your dataset for fine-tuning.
OpenAI CLI data preparation tool
We recommend using OpenAI's command-line interface (CLI) to assist with many of the data preparation steps. OpenAI has developed a tool that validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.
To install the CLI, run the following Python command:
Console |
pip install --upgrade openai |
<LOCAL_FILE>
To analyze your training data with the data preparation tool, run the following Python
command, replacing file to be analyzed:
with the full path and file name of the training data
Console |
openai tools fine_tunes.prepare_data -f <LOCAL_FILE> |
This tool accepts files in the following data formats, if they contain a prompt and a completion column/key:
Comma-separated values (CSV) Tab-separated values (TSV) Microsoft Excel workbook (XLSX) JavaScript Object Notation (JSON) JSON Lines (JSONL)
The tool reformats your training data and saves output into a JSONL file ready for fine-tuning, after guiding you through the process of implementing suggested changes.
Azure OpenAI Studio provides the Create customized model wizard, so you can interactively create and train a fine-tuned model for your Azure resource.
Go to the Azure OpenAI Studio
Navigate to the Azure OpenAI Studio at https://oai.azure.com/ and sign in with credentials that have access to your Azure OpenAI resource. During the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.
Landing page
You'll first land on our main page for Azure OpenAI Studio. From here, you can start fine-tuning a custom model.
Select the Start fine-tuning a custom model button under Manage deployments and models section of the landing page, highlighted in the following picture, to start fine-tuning a custom model.
7 Note
If your resource doesn't have a model already deployed in it, a warning is displayed. You can ignore that warning for the purposes of fine-tuning a model, because you'll be fine-tuning and deploying a new customized model.
Start the wizard from the Models page
To create a customized model, select the Create customized model button under the Provided models section on the Models page, highlighted in the following picture, to start the Create customized model wizard.
Select a base model
The first step in creating a customized model is to choose a base model. The Base
model pane lets you choose a base model to use for your customized model, and the choice influences both the performance and the cost of your model. You can create a customized model from one of the following available base models:
ada
babbage
curie
code-cushman-001 *
davinci *
* currently unavailable for new customers.
For more information about our base models that can be fine-tuned, see Models. Select a base model from the Base model type dropdown, as shown in the following picture, and then select Next to continue.
Choose your training data
The next step is to either choose existing prepared training data or upload new prepared training data to use when customizing your model. The Training data pane, shown in the following picture, displays any existing, previously uploaded datasets and provides options by which you can upload new training data.
If your training data has already been uploaded to the service, select Choose dataset, and then select the file from the list shown in the Training data pane. Otherwise, select either Local file to upload training data from a local file, or Azure blob or other shared web locations to import training data from Azure Blob or another shared web location.
For large data files, we recommend you import from an Azure Blob store. Large files can become unstable when uploaded through multipart forms because the requests are atomic and can't be retried or resumed. For more information about Azure Blob storage, see What is Azure Blob storage?
7 Note
Training data files must be formatted as JSONL files, encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.
To upload training data from a local file
You can upload a new training dataset to the service from a local file by using one of the following methods:
Drag and drop the file into the client area of the Training data pane, and then select Upload file
Select Browse for a file from the client area of the Training data pane, choose the file to upload from the Open dialog, and then select Upload file.
After you've selected and uploaded the training dataset, select Next to optionally choose your validation data.
To import training data from an Azure Blob store
You can import a training dataset from Azure Blob or another shared web location by providing the name and location of the file, as shown in the following picture. Enter the name of the file in File name and the Azure Blob URL, Azure Storage shared access signature (SAS), or other link to an accessible shared web location that contains the file in File location, then select Upload file to import the training dataset to the service.
After you've selected and uploaded the training dataset, select Next to optionally choose your validation data.
Choose your validation data
You can now choose to optionally use validation data in the training process of your fine-tuned model. If you don't want to use validation data, you can choose Next to choose advanced options for your model. Otherwise, if you have a validation dataset, you can either choose existing prepared validation data or upload new prepared validation data to use when customizing your model. The Validation data pane, shown in the following picture, displays any existing, previously uploaded training and validation datasets and provides options by which you can upload new validation data.
If your validation data has already been uploaded to the service, select Choose dataset, and then select the file from the list shown in the Validation data pane. Otherwise, select either Local file to upload validation data from a local file, or Azure blob or other shared web locations to import validation data from Azure Blob or another shared web location.
For large data files, we recommend you import from an Azure Blob store. Large files can become unstable when uploaded through multipart forms because the requests are atomic and can't be retried or resumed.
7 Note
Like training data files, validation data files must be formatted as JSONL files, encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.
To upload validation data from a local file
You can upload a new validation dataset to the service from a local file by using one of the following methods:
Drag and drop the file into the client area of the Validation data pane, and then select Upload file
Select Browse for a file from the client area of the Validation data pane, choose the file to upload from the Open dialog, and then select Upload file.
After you've uploaded the validation dataset, select Next to optionally choose advanced options.
To import validation data from an Azure Blob store
You can import a validation dataset from Azure Blob or another shared web location by providing the name and location of the file, as shown in the following picture. Enter the name of the file in File name and the Azure Blob URL, Azure Storage shared access signature (SAS), or other link to an accessible shared web location that contains the file in File location, then select Upload file to import the validation dataset to the service.
After you've imported the validation dataset, select Next to optionally choose advanced options.
Choose advanced options
You can either use default values for the hyperparameters of the fine-tune job that the wizard runs to train your fine-tuned model, or you can adjust those hyperparameters for your customization needs in the Advanced options pane, shown in the following picture.
Either select Default to use the default values for the fine-tune job, or select Advanced
to display and edit the hyperparameter values, as shown in the following picture.
The following hyperparameters are available:
Parameter name | Description |
Number of epochs | The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset. |
Batch size | The batch size to use for training. The batch size is the number of training examples used to train a single forward and backward pass. |
Learning rate multiplier | The learning rate multiplier to use for training. The fine-tuning learning rate is the original learning rate used for pre-training, multiplied by this value. |
Prompt loss weight | The weight to use for loss on the prompt tokens. This value controls how much the model tries to learn to generate the prompt (as compared to the completion, which always has a weight of 1.0.) Increasing this value can add a stabilizing effect to training when completions are short. |
For more information about these hyperparameters, see the Create a Fine tune job section of the REST API documentation.
After you've chosen either default or advanced options, select Next to review your choices and train your fine-tuned model.
Review your choices and train your model
The Review and train pane of the wizard displays information about the choices you've made in the Create customized model wizard for your fine-tuned model, as shown in the following picture.
If you're ready to train your model, select Save and close to start the fine-tune job and return to the Models page.
The Models page displays information about your customized model in the Customized models tab, as shown in the following picture. The tab includes information about the status and job ID of the fine-tune job for your customized model. When the job is completed, the file ID of the result file is also displayed.
After you've started a fine-tune job, it may take some time to complete. Your job may be queued behind other jobs on our system, and training your model can take minutes or hours depending on the model and dataset size. You can check the status of the fine-tune job for your customized model in the Status column of the Customized models tab on the Models page, and you can select Refresh to update the information on that page.
You can also select the name of the model from the Model name column of the Models page to display more information about your customized model, including the status of the fine-tune job, training results, training events, and hyperparameters used in the job. You can select the Refresh button to refresh the information for your model, as shown in the following picture.
From the model page, you can also select Download training file to download the training data you used for the model, or select Download results to download the result file attached to the fine-tune job for your model and analyze your customized model for training and validation performance.
When the fine-tune job has succeeded, you can deploy the customized model from the Models pane. You must deploy your customized model to make it available for use with completion calls.
) Important
After a customized model is deployed, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment will automatically be deleted. The deployment of a customized model is “inactive” if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period. The deletion of an inactive deployment does NOT delete or affect the underlying customized model, and the customized model can be redeployed at any time. As described in Azure OpenAI
Service pricing , each customized (fine-tuned) model that is deployed incurs an hourly hosting cost regardless of whether completions or chat completions calls are being made to the model. To learn more about planning and managing costs with Azure OpenAI, refer to our cost management guide.
7 Note
Only one deployment is permitted for a customized model. An error message is displayed if you select an already-deployed customized model.
To deploy your customized model, select the customized model to be deployed and then select Deploy model, as shown in the following picture.
The Deploy model dialog is presented, in which you can provide a name for the deployment of your customized model. Enter a name in Deployment name and then select Create to start the deployment of your customized model.
You can monitor the progress of your deployment from the Deployments pane of Azure OpenAI Studio.
Once your customized model has been deployed, you can use it like any other deployed model. For example, you can use the Playground pane of Azure OpenAI Studio to experiment with your new deployment, as shown in the following picture. You can continue to use the same parameters with your customized model, such as temperature and frequency penalty, as you can with other deployed models.
7 Note
As with all applications, we require a review process prior to going live.
Azure OpenAI attaches a result file, named results.csv , to each fine-tune job once it's completed. You can use the result file to analyze the training and validation performance of your customized model. The file ID for the result file is listed for each customized model in the Result file Id column of the Models pane for Azure OpenAI Studio. You can use the file ID to identify and download the result file from the File Management pane of Azure OpenAI Studio.
The result file is a CSV file containing a header row and a row for each training step performed by the fine-tune job. The result file contains the following columns:
Column name | Description |
step | The number of the training step. A training step represents a single pass, forward and backward, on a batch of training data. |
elapsed_tokens | The number of tokens the customized model has seen so far, including repeats. |
Column name | Description |
elapsed_examples | The number of examples the model has seen so far, including repeats. Each example represents one element in that step's batch of training data. For example, if the Batch size parameter is set to 32 in the Advanced options pane, this value increments by 32 in each training step. |
training_loss | The loss for the training batch. |
training_sequence_accuracy | The percentage of completions in the training batch for which the model's predicted tokens exactly matched the true completion tokens. For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67 (2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] . |
training_token_accuracy | The percentage of tokens in the training batch that were correctly predicted by the model. For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83 (5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] . |
validation_loss | The loss for the validation batch. |
validation_sequence_accuracy The percentage of completions in the validation batch for which the model's predicted tokens exactly matched the true completion tokens. For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.67 (2 of 3) if the model predicted [[1, 1], [0, 5], [4, 2]] . | |
validation_token_accuracy | The percentage of tokens in the validation batch that were correctly predicted by the model. For example, if the batch size is set to 3 and your data contains completions [[1, 2], [0, 5], [4, 2]] , this value is set to 0.83 (5 of 6) if the model predicted [[1, 1], [0, 5], [4, 2]] . |
When you're done with your customized model, you can delete the deployment and model. You can also delete the training and validation files you uploaded to the service, if needed.
Delete your model deployment
) Important
After a customized model is deployed, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment will automatically be deleted. The deployment of a customized model is “inactive” if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period. The deletion of an inactive deployment does NOT delete or affect the underlying customized model, and the customized model can be redeployed at any time. As described in Azure OpenAI
Service pricing , each customized (fine-tuned) model that is deployed incurs an hourly hosting cost regardless of whether completions or chat completions calls are being made to the model. To learn more about planning and managing costs with Azure OpenAI, refer to our cost management guide.
You can delete the deployment for your customized model from the Deployments page for Azure OpenAI Studio. Select the deployment to delete, and then select Delete to delete the deployment.
Delete your customized model
You can delete a customized model from the Models page for Azure OpenAI Studio. Select the customized model to delete from the Customized models tab, and then select Delete to delete the customized model.
7 Note
You cannot delete a customized model if it has an existing deployment. You must first delete your model deployment before you can delete your customized model.
Delete your training files
You can optionally delete training and validation files you've uploaded for training, and result files generated during training, from the File Management page for Azure OpenAI Studio. Select the file to delete, and then select Delete to delete the file.
Explore the full REST API Reference documentation to learn more about all the fine-tuning capabilities. You can find the full REST documentation here.
Explore more of the Python SDK operations here .
Article • 05/10/2023
More complex security scenarios require Azure role-based access control (Azure RBAC). This document covers how to authenticate to your OpenAI resource using Azure Active Directory (Azure AD).
In the following sections, you'll use the Azure CLI to assign roles, and obtain a bearer token to call the OpenAI resource. If you get stuck, links are provided in each section with all available options for each command in Azure Cloud Shell/Azure CLI.
An Azure subscription - Create one for free
Access granted to the Azure OpenAI service in the desired Azure subscription
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
Azure CLI - Installation Guide
The following Python libraries: os, requests, json
To sign-in to the Azure CLI, run the following command and complete the sign-in. You may need to do it again if your session has been idle for too long.
Azure CLI |
az login |
Assigning yourself to the Cognitive Services User role will allow you to use your account for access to the specific cognitive services resource
Get your user information
Azure CLI |
export user=$(az account show -o json | jq -r .user.name) |
Assign yourself to “Cognitive Services User” role.
Azure CLI |
export resourceId=$(az group show -g $myResourceGroupName -o json | jq -r .id) az role assignment create --role "Cognitive Services User" --assignee $user --scope $resourceId |
7 Note
Role assignment change will take ~5 mins to become effective.
Acquire an Azure AD access token. Access tokens expire in one hour. you'll then need to acquire another one.
Azure CLI |
export accessToken=$(az account get-access-token --resource https://cognitiveservices.azure.com -o json | jq -r .accessToken) |
Authorization
Make an API call Use the access token to authorize your API call by setting the header value.
Bash |
curl ${endpoint%/}/openai/deployments/YOUR_DEPLOYMENT_NAME/completions? api-version=2022-12-01 \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $accessToken" \ -d '{ "prompt": "Once upon a time" }' |
OpenAI supports Azure Active Directory (Azure AD) authentication with managed identities for Azure resources. Managed identities for Azure resources can authorize access to Cognitive Services resources using Azure AD credentials from applications running in Azure virtual machines (VMs), function apps, virtual machine scale sets, and other services. By using managed identities for Azure resources together with Azure AD authentication, you can avoid storing credentials with your applications that run in the cloud.
Before you can use managed identities for Azure resources to authorize access to Cognitive Services resources from your VM, you must enable managed identities for Azure resources on the VM. To learn how to enable managed identities for Azure Resources, see:
Azure portal Azure PowerShell Azure CLI
Azure Resource Manager template Azure Resource Manager client libraries
For more information about managed identities, see Managed identities for Azure resources.
Article • 03/08/2023
An application that accesses a Cognitive Services resource when network rules are in effect requires authorization. Authorization is supported with Azure Active Directory (Azure AD) credentials or with a valid API key.
) Important
Turning on firewall rules for your Cognitive Services account blocks incoming requests for data by default. In order to allow requests through, one of the following conditions needs to be met:
The request should originate from a service operating within an Azure Virtual Network (VNet) on the allowed subnet list of the target Cognitive Services account. The endpoint in requests originated from VNet needs to be set as the custom subdomain of your Cognitive Services account.
Or the request should originate from an allowed list of IP addresses.
Requests that are blocked include those from other Azure services, from the Azure portal, from logging and metrics services, and so on.
7 Note
We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.
To secure your Cognitive Services resource, you should first configure a rule to deny access to traffic from all networks (including internet traffic) by default. Then, you should configure rules that grant access to traffic from specific VNets. This configuration enables you to build a secure network boundary for your applications. You can also configure rules to grant access to traffic from select public internet IP address ranges, enabling connections from specific internet or on-premises clients.
Network rules are enforced on all network protocols to Azure Cognitive Services, including REST and WebSocket. To access data using tools such as the Azure test consoles, explicit network rules must be configured. You can apply network rules to existing Cognitive Services resources, or when you create new Cognitive Services resources. Once network rules are applied, they're enforced for all requests.
Virtual networks (VNETs) are supported in regions where Cognitive Services are
available . Cognitive Services supports service tags for network rules configuration. The services listed below are included in the CognitiveServicesManagement service tag.
" Anomaly Detector
" Azure OpenAI
" Computer Vision
" Content Moderator
" Custom Vision
" Face
" Language Understanding (LUIS)
" Personalizer
" Speech service
" Language service
" QnA Maker
" Translator Text
7 Note
If you're using, Azure OpenAI, LUIS, Speech Services, or Language services, the
CognitiveServicesManagement tag only enables you use the service using the SDK or REST API. To access and use Azure OpenAI Studio, LUIS portal , Speech Studio or Language Studio from a virtual network, you will need to use the following tags:
AzureActiveDirectory AzureFrontDoor.Frontend
AzureResourceManager
CognitiveServicesManagement CognitiveServicesFrontEnd
By default, Cognitive Services resources accept connections from clients on any network. To limit access to selected networks, you must first change the default action.
2 Warning
Making changes to network rules can impact your applications' ability to connect to Azure Cognitive Services. Setting the default network rule to deny blocks all access to the data unless specific network rules that grant access are also applied. Be sure to grant access to any allowed networks using network rules before you change the default rule to deny access. If you are allow listing IP addresses for your on-premises network, be sure to add all possible outgoing public IP addresses from your on-premises network.
Managing default network access rules
You can manage default network access rules for Cognitive Services resources through the Azure portal, PowerShell, or the Azure CLI.
Go to the Cognitive Services resource you want to secure.
Select the RESOURCE MANAGEMENT menu called Virtual network.
Azure portal
|
You can configure Cognitive Services resources to allow access only from specific subnets. The allowed subnets may belong to a VNet in the same subscription, or in a different subscription, including subscriptions belonging to a different Azure Active Directory tenant.
Enable a service endpoint for Azure Cognitive Services within the VNet. The service endpoint routes traffic from the VNet through an optimal path to the Azure Cognitive Services service. The identities of the subnet and the virtual network are also transmitted with each request. Administrators can then configure network rules for the Cognitive Services resource that allow requests to be received from specific subnets in a VNet.
Clients granted access via these network rules must continue to meet the authorization requirements of the Cognitive Services resource to access the data.
Each Cognitive Services resource supports up to 100 virtual network rules, which may be combined with IP network rules.
Required permissions
To apply a virtual network rule to a Cognitive Services resource, the user must have the appropriate permissions for the subnets being added. The required permission is the default Contributor role, or the Cognitive Services Contributor role. Required permissions can also be added to custom role definitions.
Cognitive Services resource and the virtual networks granted access may be in different subscriptions, including subscriptions that are a part of a different Azure AD tenant.
7 Note
Configuration of rules that grant access to subnets in virtual networks that are a part of a different Azure Active Directory tenant are currently only supported through PowerShell, CLI and REST APIs. Such rules cannot be configured through the Azure portal, though they may be viewed in the portal.
Managing virtual network rules
You can manage virtual network rules for Cognitive Services resources through the Azure portal, PowerShell, or the Azure CLI.
Go to the Cognitive Services resource you want to secure.
Select the RESOURCE MANAGEMENT menu called Virtual network.
Check that you've selected to allow access from Selected networks.
To grant access to a virtual network with an existing network rule, under
Virtual networks, select Add existing virtual network.
Azure portal
5. Select the Virtual networks and Subnets options, and then select Enable. |
|
7 Note If a service endpoint for Azure Cognitive Services wasn't previously configured for the selected virtual network and subnets, you can configure it as part of this operation. Presently, only virtual networks belonging to the same Azure Active Directory tenant are shown for selection during rule creation. To grant access to a subnet in a virtual network belonging to another tenant, please use PowerShell, CLI or REST APIs. |
|
) Important
Be sure to set the default rule to deny, or network rules have no effect.
You can configure Cognitive Services resources to allow access from specific public internet IP address ranges. This configuration grants access to specific services and on-premises networks, effectively blocking general internet traffic.
16.17.18.0/24
Provide allowed internet address ranges using CIDR notation in the form or as individual IP addresses like 16.17.18.19 .
Tip
Small address ranges using "/31" or "/32" prefix sizes are not supported. These ranges should be configured using individual IP address rules.
IP network rules are only allowed for public internet IP addresses. IP address ranges reserved for private networks (as defined in RFC 1918 ) aren't allowed in IP rules.
172.16.*
Private networks include addresses that start with 10.* , - 172.31.* , and
192.168.* .
Only IPV4 addresses are supported at this time. Each Cognitive Services resource supports up to 100 IP network rules, which may be combined with Virtual network rules.
Configuring access from on-premises networks
To grant access from your on-premises networks to your Cognitive Services resource with an IP network rule, you must identify the internet facing IP addresses used by your network. Contact your network administrator for help.
If you're using ExpressRoute on-premises for public peering or Microsoft peering, you'll need to identify the NAT IP addresses. For public peering, each ExpressRoute circuit by default uses two NAT IP addresses. Each is applied to Azure service traffic when the traffic enters the Microsoft Azure network backbone. For Microsoft peering, the NAT IP addresses that are used are either customer provided or are provided by the service provider. To allow access to your service resources, you must allow these public IP addresses in the resource IP firewall setting. To find your public peering ExpressRoute circuit IP addresses, open a support ticket with ExpressRoute via the Azure portal.
Learn more about NAT for ExpressRoute public and Microsoft peering.
Managing IP network rules
You can manage IP network rules for Cognitive Services resources through the Azure portal, PowerShell, or the Azure CLI.
Go to the Cognitive Services resource you want to secure.
Select the RESOURCE MANAGEMENT menu called Virtual network.
Check that you've selected to allow access from Selected networks.
To grant access to an internet IP range, enter the IP address or address range (in CIDR format ) under Firewall > Address Range. Only valid public IP (non-reserved) addresses are accepted.
Azure portal
|
) Important
Be sure to set the default rule to deny, or network rules have no effect.
You can use private endpoints for your Cognitive Services resources to allow clients on a virtual network (VNet) to securely access data over a Private Link. The private endpoint uses an IP address from the VNet address space for your Cognitive Services resource.
Network traffic between the clients on the VNet and the resource traverses the VNet and a private link on the Microsoft backbone network, eliminating exposure from the public internet.
Private endpoints for Cognitive Services resources let you:
Secure your Cognitive Services resource by configuring the firewall to block all connections on the public endpoint for the Cognitive Services service.
Increase security for the VNet, by enabling you to block exfiltration of data from the VNet.
Securely connect to Cognitive Services resources from on-premises networks that connect to the VNet using VPN or ExpressRoutes with private-peering.
Conceptual overview
A private endpoint is a special network interface for an Azure resource in your VNet. Creating a private endpoint for your Cognitive Services resource provides secure connectivity between clients in your VNet and your resource. The private endpoint is assigned an IP address from the IP address range of your VNet. The connection between the private endpoint and the Cognitive Services service uses a secure private link.
Applications in the VNet can connect to the service over the private endpoint seamlessly, using the same connection strings and authorization mechanisms that they would use otherwise. The exception is the Speech Services, which require a separate endpoint. See the section on Private endpoints with the Speech Services. Private endpoints can be used with all protocols supported by the Cognitive Services resource, including REST.
Private endpoints can be created in subnets that use Service Endpoints. Clients in a subnet can connect to one Cognitive Services resource using private endpoint, while using service endpoints to access others.
When you create a private endpoint for a Cognitive Services resource in your VNet, a consent request is sent for approval to the Cognitive Services resource owner. If the user requesting the creation of the private endpoint is also an owner of the resource, this consent request is automatically approved.
Cognitive Services resource owners can manage consent requests and the private endpoints, through the 'Private endpoints' tab for the Cognitive Services resource in the Azure portal .
Private endpoints
When creating the private endpoint, you must specify the Cognitive Services resource it connects to. For more information on creating a private endpoint, see:
Create a private endpoint using the Private Link Center in the Azure portal Create a private endpoint using Azure CLI
Create a private endpoint using Azure PowerShell
Connecting to private endpoints
7 Note
Azure OpenAI Service uses a different private DNS zone and public DNS zone forwarder than other Azure Cognitive Services. Refer to the Azure services DNS zone configuration article for the correct zone and forwader names.
Clients on a VNet using the private endpoint should use the same connection string for the Cognitive Services resource as clients connecting to the public endpoint. The exception is the Speech Services, which require a separate endpoint. See the section on Private endpoints with the Speech Services. We rely upon DNS resolution to automatically route the connections from the VNet to the Cognitive Services resource over a private link.
We create a private DNS zone attached to the VNet with the necessary updates for the private endpoints, by default. However, if you're using your own DNS server, you may need to make additional changes to your DNS configuration. The section on DNS changes below describes the updates required for private endpoints.
Private endpoints with the Speech Services
See Using Speech Services with private endpoints provided by Azure Private Link.
DNS changes for private endpoints
When you create a private endpoint, the DNS CNAME resource record for the Cognitive Services resource is updated to an alias in a subdomain with the prefix 'privatelink'. By default, we also create a private DNS zone, corresponding to the 'privatelink' subdomain, with the DNS A resource records for the private endpoints.
When you resolve the endpoint URL from outside the VNet with the private endpoint, it resolves to the public endpoint of the Cognitive Services resource. When resolved from the VNet hosting the private endpoint, the endpoint URL resolves to the private endpoint's IP address.
This approach enables access to the Cognitive Services resource using the same connection string for clients in the VNet hosting the private endpoints and clients outside the VNet.
If you are using a custom DNS server on your network, clients must be able to resolve the fully qualified domain name (FQDN) for the Cognitive Services resource endpoint to the private endpoint IP address. Configure your DNS server to delegate your private link subdomain to the private DNS zone for the VNet.
Tip
When using a custom or on-premises DNS server, you should configure your DNS server to resolve the Cognitive Services resource name in the 'privatelink' subdomain to the private endpoint IP address. You can do this by delegating the 'privatelink' subdomain to the private DNS zone of the VNet, or configuring the DNS zone on your DNS server and adding the DNS A records.
For more information on configuring your own DNS server to support private endpoints, refer to the following articles:
Name resolution for resources in Azure virtual networks DNS configuration for private endpoints
Pricing
For pricing details, see Azure Private Link pricing .
Explore the various Azure Cognitive Services
Learn more about Azure Virtual Network Service Endpoints
Article • 02/17/2023 • 5 minutes to read
Azure OpenAI can be used to solve a large number of natural language tasks through prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, we have integrated the Azure OpenAI service with the distributed machine learning library SynapseML . This integration makes it easy to use the Apache Spark distributed computing framework to process millions of prompts with the OpenAI service. This tutorial shows how to apply large language models at a distributed scale using Azure Open AI and Azure Synapse Analytics.
An Azure subscription - Create one for free
Access granted to Azure OpenAI in the desired Azure subscription
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
An Azure OpenAI resource – create a resource
An Apache Spark cluster with SynapseML installed - create a serverless Apache Spark pool here
pyspark
We recommend creating a Synapse workspace, but an Azure Databricks, HDInsight, or
Spark on Kubernetes, or even a Python environment with the work.
package, will also
The next step is to add this code into your Spark cluster. You can either create a notebook in your Spark platform and copy the code into this notebook to run the demo, or download the notebook and import it into Synapse Analytics.
Download this demo as a notebook (click Raw, then save the file)
Import the notebook into the Synapse Workspace or, if using Databricks, into the Databricks Workspace
Install SynapseML on your cluster. See the installation instructions for Synapse at the bottom of the SynapseML website . This requires pasting another cell at the top of the notebook you imported
Connect your notebook to a cluster and follow along, editing and running the cells below.
key
Next, edit the cell in the notebook to point to your service. In particular, set the
resource_name , deployment_name , location , and values for your Azure OpenAI resource.
variables to the corresponding
) Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.
Python |
import os # Replace the following values with your Azure OpenAI resource information resource_name = "RESOURCE_NAME" # The name of your Azure OpenAI resource. deployment_name = "DEPLOYMENT_NAME" # The name of your Azure OpenAI deployment. location = "RESOURCE_LOCATION" # The location or region ID for your resource. key = "RESOURCE_API_KEY" # The key for your resource. assert key is not None and resource_name is not None |
Next, create a dataframe consisting of a series of rows, with one prompt per row.
You can also load data directly from Azure Data Lake Storage (ADLS) or other databases. For more information about loading and preparing Spark dataframes, see the Apache Spark data loading guide .
Python
df = spark.createDataFrame( [ ("Hello my name is",), ("The best code is code that's",), ("SynapseML is ",), ] ).toDF("prompt") |
To apply the OpenAI Completion service to the dataframe that you just created, create
OpenAICompletion
an object that serves as a distributed client. Parameters of the service
can be set either with a single value, or by a column of the dataframe with the
OpenAICompletion
maxTokens
appropriate setters on the object. Here, we're setting to
promptCol
200. A token is around four characters, and this limit applies to the sum of the prompt
and the result. We're also setting the column in the dataframe.
parameter with the name of the prompt
Python |
from synapse.ml.cognitive import OpenAICompletion completion = ( OpenAICompletion() .setSubscriptionKey(key) .setDeploymentName(deployment_name) .setUrl("https://{}.openai.azure.com/".format(resource_name)) .setMaxTokens(200) .setPromptCol("prompt") .setErrorCol("error") .setOutputCol("completions") ) |
Now that you have the dataframe and the completion client, you can transform your
completions
input dataset and add a column called with all of the information the
service adds. We'll select out just the text for simplicity.
Python |
from pyspark.sql.functions import col completed_df = completion.transform(df).cache() display(completed_df.select( col("prompt"), col("error"), col("completions.choices.text").getItem(0).alias("text"))) |
Your output should look something like the following example; note that the completion text can vary.
prompt error text |
Hello my name is undefined Makaveli I'm eighteen years old and I want to be a rapper when I grow up I love writing and making music I'm from Los Angeles, CA |
The best code is undefined understandable This is a subjective statement, code that's and there is no definitive answer. |
SynapseML is undefined A machine learning algorithm that is able to learn how to predict the future outcome of events. |
Improve throughput with request batching
The example above makes several requests to the service, one for each prompt. To complete multiple prompts in a single request, use batch mode. First, in the
OpenAICompletion
object, instead of setting the Prompt column to "Prompt", specify "batchPrompt" for the BatchPrompt column. To do so, create a dataframe with a list of prompts per row.
7 Note
There is currently a limit of 20 prompts in a single request and a limit of 2048 "tokens", or approximately 1500 words.
Python |
batch_df = spark.createDataFrame( [ (["The time has come", "Pleased to", "Today stocks", "Here's to"],), (["The only thing", "Ask not what", "Every litter", "I am"],), |
] ).toDF("batchPrompt") |
OpenAICompletion
Next we create the object. Rather than setting the prompt column, set
the batchPrompt column if your column is of type Array[String] .
Python |
batch_completion = ( OpenAICompletion() .setSubscriptionKey(key) .setDeploymentName(deployment_name) .setUrl("https://{}.openai.azure.com/".format(resource_name)) .setMaxTokens(200) .setBatchPromptCol("batchPrompt") .setErrorCol("error") .setOutputCol("completions") ) |
In the call to transform, a request will then be made per row. Because there are multiple prompts in a single row, each request will be sent with all prompts in that row. The results will contain a row for each row in the request.
Python |
completed_batch_df = batch_completion.transform(batch_df).cache() display(completed_batch_df) |
7 Note
There is currently a limit of 20 prompts in a single request and a limit of 2048 "tokens", or approximately 1500 words.
Using an automatic mini-batcher
If your data is in column format, you can transpose it to row format using SynapseML's
FixedMiniBatcherTransformer .
Python |
from pyspark.sql.types import StringType from synapse.ml.stages import FixedMiniBatchTransformer from synapse.ml.core.spark import FluentAPI completed_autobatch_df = (df |
.coalesce(1) # Force a single partition so that our little 4-row dataframe makes a batch of size 4, you can remove this step for large datasets .mlTransform(FixedMiniBatchTransformer(batchSize=4)) .withColumnRenamed("prompt", "batchPrompt") .mlTransform(batch_completion)) display(completed_autobatch_df) |
Prompt engineering for translation
Azure OpenAI can solve many different natural language tasks through prompt engineering. Here, we show an example of prompting for language translation:
Python |
translate_df = spark.createDataFrame( [ ("Japanese: Ookina hako \nEnglish: Big box \nJapanese: Midori tako\nEnglish:",), ("French: Quelle heure est-il à Montréal? \nEnglish: What time is it in Montreal? \nFrench: Où est le poulet? \nEnglish:",), ] ).toDF("prompt") display(completion.transform(translate_df)) |
Prompt for question answering
Here, we prompt the GPT-3 model for general-knowledge question answering:
Python |
qa_df = spark.createDataFrame( [ ( "Q: Where is the Grand Canyon?\nA: The Grand Canyon is in Arizona.\n\nQ: What is the weight of the Burj Khalifa in kilograms?\nA:", ) ] ).toDF("prompt") display(completion.transform(qa_df)) |
Article • 04/21/2023
Azure OpenAI automatically encrypts your data when it's persisted to the cloud. The encryption protects your data and helps you meet your organizational security and compliance commitments. This article covers how Azure OpenAI handles encryption of data at rest, specifically training data and fine-tuned models. For information on how data provided by you to the service is processed, used, and stored, consult the data, privacy, and security article.
Azure OpenAI is part of Azure Cognitive Services. Cognitive Services data is encrypted and decrypted using FIPS 140-2 compliant 256-bit AES encryption. Encryption and decryption are transparent, meaning encryption and access are managed for you. Your data is secure by default and you don't need to modify your code or applications to take advantage of encryption.
By default, your subscription uses Microsoft-managed encryption keys. There's also the option to manage your subscription with your own keys called customer-managed keys (CMK). CMK offers greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data.
Customer-managed keys (CMK), also known as Bring your own key (BYOK), offer greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data.
You must use Azure Key Vault to store your customer-managed keys. You can either create your own keys and store them in a key vault, or you can use the Azure Key Vault APIs to generate keys. The Cognitive Services resource and the key vault must be in the same region and in the same Azure Active Directory (Azure AD) tenant, but they can be in different subscriptions. For more information about Azure Key Vault, see What is Azure Key Vault?.
To request the ability to use customer-managed keys, fill out and submit the Cognitive Services Customer-Managed Key Request Form . It will take approximately 3-5 business days to hear back on the status of your request.
To enable customer-managed keys, you must also enable both the Soft Delete and Do Not Purge properties on the key vault.
Only RSA keys of size 2048 are supported with Cognitive Services encryption. For more information about keys, see Key Vault keys in About Azure Key Vault keys, secrets and certificates.
To enable customer-managed keys in the Azure portal, follow these steps:
Go to your Cognitive Services resource.
On the left, select Encryption.
Under Encryption type, select Customer Managed Keys, as shown in the following screenshot.
After you enable customer-managed keys, you can specify a key to associate with the Cognitive Services resource.
Specify a key as a URI
To specify a key as a URI, follow these steps:
In the Azure portal, go to your key vault.
Under Settings, select Keys.
Select the desired key, and then select the key to view its versions. Select a key version to view the settings for that version.
Copy the Key Identifier value, which provides the URI.
Go back to your Cognitive Services resource, and then select Encryption.
Under Encryption key, select Enter key URI.
Paste the URI that you copied into the Key URI box.
Under Subscription, select the subscription that contains the key vault.
Save your changes.
Specify a key from a key vault
To specify a key from a key vault, first make sure that you have a key vault that contains a key. Then follow these steps:
Go to your Cognitive Services resource, and then select Encryption.
Under Encryption key, select Select from Key Vault.
Select the key vault that contains the key that you want to use.
Select the key that you want to use.
Save your changes.
When you create a new version of a key, update the Cognitive Services resource to use the new version. Follow these steps:
Go to your Cognitive Services resource, and then select Encryption.
Enter the URI for the new key version. Alternately, you can select the key vault and then select the key again to update the version.
Save your changes.
To change the key that you use for encryption, follow these steps:
Go to your Cognitive Services resource, and then select Encryption.
Enter the URI for the new key. Alternately, you can select the key vault and then select a new key.
Save your changes.
You can rotate a customer-managed key in Key Vault according to your compliance policies. When the key is rotated, you must update the Cognitive Services resource to use the new key URI. To learn how to update the resource to use a new version of the key in the Azure portal, see Update the key version.
Rotating the key doesn't trigger re-encryption of data in the resource. No further action is required from the user.
You can revoke a customer-managed encryption key by changing the access policy, by changing the permissions on the key vault, or by deleting the key.
To change the access policy of the managed identity that your registry uses, run the az-keyvault-delete-policy command:
Azure CLI |
az keyvault delete-policy \ --resource-group <resource-group-name> \ --name <key-vault-name> \ --key_id <key-vault-key-id> |
To delete the individual versions of a key, run the az-keyvault-key-delete command. This operation requires the keys/delete permission.
Azure CLI |
az keyvault key delete \ --name <key-vault-name> \ --object-id $identityPrincipalID \ |
) Important
Revoking access to an active customer-managed key while CMK is still enabled will prevent downloading of training data and results files, fine-tuning new models, and deploying fine-tuned models. However, previously deployed fine-tuned models will continue to operate and serve traffic until those deployments are deleted.
Delete training, validation, and training results data
The Files API allows customers to upload their training data for the purpose of fine-tuning a model. This data is stored in Azure Storage, within the same region as the resource and logically isolated with their Azure subscription and API Credentials.
Uploaded files can be deleted by the user via the DELETE API operation.
Delete fine-tuned models and deployments
When you disable customer-managed keys, your Cognitive Services resource is then encrypted with Microsoft-managed keys. To disable customer-managed keys, follow these steps:
Go to your Cognitive Services resource, and then select Encryption.
Select Microsoft Managed Keys > Save.
) Important
If you disable system assigned managed identities, access to the key vault will be removed and any data encrypted with the customer keys will no longer be accessible. Any features depended on this data will stop working.
) Important
Managed identities do not currently support cross-directory scenarios. When you configure customer-managed keys in the Azure portal, a managed identity is automatically assigned under the covers. If you subsequently move the subscription, resource group, or resource from one Azure AD directory to another, the managed identity associated with the resource is not transferred to the new tenant, so customer-managed keys may no longer work. For more information, see Transferring a subscription between Azure AD directories in FAQs and known
issues with managed identities for Azure resources.
Language service Customer-Managed Key Request Form Learn more about Azure Key Vault
Article • 02/17/2023 • 2 minutes to read
Azure OpenAI is available in two regions. Since subscription keys are region bound, when a customer acquires a key, they select the region in which their deployments will reside and from then on, all operations stay associated with that Azure server region.
It's rare, but not impossible, to encounter a network issue that hits an entire region. If your service needs to always be available, then you should design it to either fail-over into another region or split the workload between two or more regions. Both approaches require at least two OpenAI resources in different regions. This article provides general recommendations for how to implement Business Continuity and Disaster Recovery (BCDR) for your Azure OpenAI applications.
Today customers will call the endpoint provided during deployment for both deployments and inference. These operations are stateless, so no data is lost in the case that a region becomes unavailable.
If a region is non-operational customers must take steps to ensure service continuity.
The following set of instructions applies both customers using default endpoints and those using custom endpoints.
Default endpoint recovery
If you're using a default endpoint, you should configure your client code to monitor errors, and if the errors persist, be prepared to redirect to another region of your choice where you have an Azure OpenAI subscription.
Follow these steps to configure your client to monitor errors:
Use this page to identify the list of available regions for the OpenAI service.
Select a primary and one secondary/backup regions from the list.
Create OpenAI Service resources for each region selected
For the primary region and any backup regions your code will need to know:
Base URI for the resource
Regional access key or Azure Active Directory access
Configure your code so that you monitor connectivity errors (typically connection timeouts and service unavailability errors).
Given that networks yield transient errors, for single connectivity issue occurrences, the suggestion is to retry.
For persistence redirect traffic to the backup resource in the region you've created.
The recovery from regional failures for this usage type can be performed instantaneously and at a very low cost. This does however, require custom development of this functionality on the client side of your application.
Article • 03/15/2023 • 5 minutes to read
When you have critical applications and business processes relying on Azure resources, you want to monitor those resources for their availability, performance, and operation.
This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of Cognitive Services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor common to all Azure services that use it, read Monitoring Azure resources with Azure Monitor.
Monitoring data
Azure OpenAI collects the same kinds of monitoring data as other Azure resources that are described in Monitoring data from Azure resources.
Collection and routing
Platform metrics and the Activity log are collected and stored automatically, but can be routed to other locations by using a diagnostic setting.
Resource Logs aren't collected and stored until you create a diagnostic setting and route them to one or more locations.
See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic setting, you specify which categories of logs to collect.
Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has additional costs associated with it. To understand more, consult the Azure Monitor cost calculation guide.
The metrics and logs you can collect are discussed in the following sections.
Analyzing metrics
You can analyze metrics for Azure OpenAI by opening Metrics which can be found underneath the Monitoring section when viewing your Azure OpenAI resource in the Azure portal. See Getting started with Azure Metrics Explorer for details on using this tool.
Azure OpenAI is a part of Cognitive Services. For a list of all platform metrics collected for Cognitive Services and Azure OpenAI, see Cognitive Services supported metrics.
For the current subset of metrics available in Azure OpenAI:
Azure OpenAI Metrics
Metric | Exportable | Metric | Unit | Aggregation | Description | Dimensions |
via | Display | Type | ||||
Diagnostic | Name | |||||
Settings? | ||||||
BlockedCalls | Yes | Blocked Calls | Count | Total | Number of calls that exceeded rate or quota limit. | ApiName, OperationName, Region, RatelimitKey |
ClientErrors | Yes | Client Errors | Count | Total | Number of calls with client side error (HTTP response code 4xx). | ApiName, OperationName, Region, RatelimitKey |
DataIn | Yes | Data In | Bytes | Total | Size of | ApiName, |
incoming | OperationName, Region | |||||
data in | ||||||
bytes. | ||||||
DataOut | Yes | Data Out | Bytes | Total | Size of | ApiName, |
outgoing | OperationName, Region | |||||
data in | ||||||
bytes. | ||||||
FineTunedTrainingHours | Yes | Processed FineTuned Training Hours | Count | Total | Number of Training Hours Processed on an OpenAI FineTuned Model | ApiName, ModelDeploymentName, FeatureName, UsageChannel, Region |
Latency | Yes | Latency | MilliSeconds | Average | Latency in milliseconds. | ApiName, OperationName, Region, RatelimitKey |
Ratelimit | Yes | Ratelimit | Count | Total | The current | Region, RatelimitKey |
ratelimit of | ||||||
the ratelimit | ||||||
key. | ||||||
ServerErrors | Yes | Server Errors | Count | Total | Number of calls with service internal error (HTTP response code 5xx). | ApiName, OperationName, Region, RatelimitKey |
SuccessfulCalls | Yes | Successful Calls | Count | Total | Number of successful calls. | ApiName, OperationName, Region, RatelimitKey |
Metric | Exportable | Metric | Unit | Aggregation | Description | Dimensions |
via | Display | Type | ||||
Diagnostic | Name | |||||
Settings? | ||||||
TokenTransaction | Yes | Processed Inference Tokens | Count | Total | Number of Inference Tokens Processed on an OpenAI Model | ApiName, ModelDeploymentName, FeatureName, UsageChannel, Region |
TotalCalls | Yes | Total Calls | Count | Total | Total number of calls. | ApiName, OperationName, Region, RatelimitKey |
TotalErrors | Yes | Total Errors | Count | Total | Total number of calls with error response (HTTP response code 4xx or 5xx). | ApiName, OperationName, Region, RatelimitKey |
Analyzing logs
Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties.
All resource logs in Azure Monitor have the same fields followed by service-specific fields. The common schema is outlined in Azure Monitor resource log schema.
The Activity log is a type of platform log in Azure that provides insight into subscription-level events. You can view it independently or route it to Azure Monitor Logs, where you can do much more complex queries using Log Analytics.
For a list of the types of resource logs available for Azure OpenAI and other Cognitive Services, see Resource provider operations for Cognitive Services
Kusto queries
) Important
When you select Logs from the Azure OpenAI menu, Log Analytics is opened with the query scope set to the current Azure OpenAI resource. This means that log queries will only include data from that resource. If you want to run a query that includes data from other resources or data from other Azure services, select Logs from the Azure Monitor menu. See Log query scope and time range in Azure Monitor Log Analytics for details.
To explore and get a sense of what type of information is available for your Azure OpenAI resource a useful query to start with once you have deployed a model and sent some completion calls through the playground is as follows:
AzureDiagnostics
| take 100
| project TimeGenerated, _ResourceId, Category,OperationName, DurationMs, ResultSignature, properties_s
Kusto
Here we return a sample of 100 entries and are displaying a subset of the available columns of data in the logs. The results are as follows:
If you wish to see all available columns of data, you can remove the scoping that is provided by the |
project
line:
AzureDiagnostics
| take 100
Kusto
You can also select the arrow next to the table name to view all available columns and associated data types.
To examine AzureMetrics run:
AzureMetrics
| take 100
| project TimeGenerated, MetricName, Total, Count, TimeGrain, UnitName
Kusto
Alerts
Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data. They allow you to identify and address issues in your system before your customers notice them. You can set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and drawbacks.
Every organization's alerting needs are going to vary, and will also evolve over time. Generally all alerts should be actionable, with a specific intended response if the alert occurs. If there's no action for someone to take, then it might be something you want to capture in a report, but not in an alert. Some use cases may require alerting anytime certain error conditions exist. But in many environments, it might only be in cases where errors exceed a certain threshold for a period of time where sending an alert is warranted.
Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor Logs. As you analyze your log data over time, you may also find that a certain condition not occurring for a long enough period of time might be valuable to track with alerts. Sometimes the absence of an event in a log is just as important a signal as an error.
Depending on what type of application you're developing in conjunction with your use of Azure OpenAI, Azure Monitor Application Insights may offer additional monitoring benefits at the application layer.
Next steps
See Monitoring Azure resources with Azure Monitor for details on monitoring Azure resources. Read Understand log searches in Azure Monitor logs.
Additional resources
Documentation
Plan to manage costs for Azure OpenAI Service - Azure Cognitive Services
Learn how to plan for and manage costs for Azure OpenAI by using cost analysis in the Azure portal.
Azure OpenAI Service encryption of data at rest - Azure Cognitive Services
Learn how Azure OpenAI encrypts your data when it's persisted to the cloud.
Models - REST API (Azure Cognitive Services)
Learn more about [Cognitive Services Models Operations]. How to [Get,List].
How to Configure Azure OpenAI Service with Managed Identities - Azure OpenAI
Provides guidance on how to set managed identity with Azure Active Directory
Azure OpenAI Service content filtering - Azure OpenAI
Learn about the content filtering capabilities of Azure OpenAI in Azure Cognitive Services
How-to - Use Azure OpenAI Service with large datasets - Azure OpenAI
Walkthrough on how to integrate Azure OpenAI with SynapseML and Apache Spark to apply large language models at a distributed scale.
Deployments - REST API (Azure Cognitive Services)
Learn more about [Cognitive Services Deployments Operations]. How to [Create,Delete,Get,List,Update].
Fine Tunes - REST API (Azure Cognitive Services)
Learn more about [Cognitive Services Fine Tunes Operations]. How to [Cancel,Create,Delete,Get,Get Events,List].
Show 5 more
Article • 04/05/2023 • 7 minutes to read
This article describes how you plan for and manage costs for Azure OpenAI Service. Before you deploy the service, you can use the Azure pricing calculator to estimate costs for Azure OpenAI. Later, as you deploy Azure resources, review the estimated costs.
After you've started using Azure OpenAI resources, use Cost Management features to set budgets and monitor costs. You can also review forecasted costs and identify spending trends to identify areas where you might want to act. Costs for Azure OpenAI Service are only a portion of the monthly costs in your Azure bill. Although this article explains how to plan for and manage costs for Azure OpenAI, you're billed for all Azure services and resources used in your Azure subscription, including the third-party services.
Cost analysis in Cost Management supports most Azure account types, but not all of them. To view the full list of supported account types, see Understand Cost Management data. To view cost data, you need at least read access for an Azure account. For information about assigning access to Azure Cost Management data, see Assign access to data.
Use the Azure pricing calculator to estimate the costs of using Azure OpenAI.
Azure OpenAI Service runs on Azure infrastructure that accrues costs when you deploy new resources. It's important to understand that there could be other additional infrastructure costs that might accrue.
How you're charged for Azure OpenAI Service
Base series and Codex series models
Azure OpenAI base series and Codex series models are charged per 1,000 tokens. Costs vary depending on which model series you choose: Ada, Babbage, Curie, Davinci, or Code-Cushman.
Our models understand and process text by breaking it down into tokens. For reference, each token is roughly four characters for typical English text.
Token costs are for both input and output. For example, if you have a 1,000 token JavaScript code sample that you ask an Azure OpenAI model to convert to Python. You would be charged approximately 1,000 tokens for the initial input request sent, and 1,000 more tokens for the output that is received in response for a total of 2,000 tokens.
In practice, for this type of completion call the token input/output wouldn't be perfectly 1:1. A conversion from one programming language to another could result in a longer or shorter output depending on many different factors including the value assigned to the max_tokens parameter.
Base Series and Codex series fine-tuned models
Azure OpenAI fine-tuned models are charged based on three factors: Training hours
Hosting hours
Inference per 1,000 tokens
The hosting hours cost is important to be aware of since once a fine-tuned model is deployed it continues to incur an hourly cost regardless of whether you're actively using it. Fine-tuned model costs should be monitored closely.
) Important
After a customized model is deployed, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment will automatically be deleted. The deployment of a customized model is “inactive” if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period. The deletion of an inactive deployment does NOT delete or affect the underlying customized model, and the customized model can be redeployed at any time. As described in Azure OpenAI
Service pricing , each customized (fine-tuned) model that is deployed incurs an hourly hosting cost regardless of whether completions or chat completions calls are being made to the model. To learn more about planning and managing costs with Azure OpenAI, refer to our cost management guide.
Other costs that might accrue with Azure OpenAI Service
Keep in mind that enabling capabilities like sending data to Azure Monitor Logs, alerting, etc. incurs additional costs for those services. These costs are visible under those other services and at the subscription level, but aren't visible when scoped just to your Azure OpenAI resource.
Using Azure Prepayment with Azure OpenAI Service
You can pay for Azure OpenAI Service charges with your Azure Prepayment credit. However, you can't use Azure Prepayment credit to pay for charges for third party products and services including those from the Azure Marketplace.
When you use cost analysis, you view Azure OpenAI costs in graphs and tables for different time intervals. Some examples are by day, current and prior month, and year. You also view costs against budgets and forecasted costs. Switching to longer views over time can help you identify spending trends. And you see where overspending might have occurred. If you've created budgets, you can also easily see where they're exceeded.
To view Azure OpenAI costs in cost analysis:
Sign in to the Azure portal.
Select one of your Azure OpenAI resources.
Under Resource Management select Cost analysis
By default cost analysis is scoped to the individual Azure OpenAI resource.
To understand the breakdown of what makes up that cost, it can help to modify Group by to Meter and in this case switching the chart type to Line. You can now see that for this particular resource the source of the costs is from three different model series with Text-Davinci Tokens representing the bulk of the costs.
It's important to understand scope when evaluating costs associated with Azure OpenAI. If your resources are part of the same resource group you can scope Cost Analysis at that level to understand the effect on costs. If your resources are spread across multiple resource groups you can scope to the subscription level.
However, when scoped at a higher level you often need to add additional filters to be able to zero in on Azure OpenAI usage. When scoped at the subscription level we see a number of other resources that we may not care about in the context of Azure OpenAI cost management. When scoping at the subscription level, we recommend navigating to the full Cost analysis tool under the Cost Management service. Search for "Cost Management" in the top Azure search bar to navigate to the full service experience, which includes more options like creating budgets.
If you try to add a filter by service, you'll find that you can't find Azure OpenAI in the list. This is because technically Azure OpenAI is part of Cognitive Services so the service level filter is Cognitive Services, but if you want to see all Azure OpenAI resources across a subscription without any other type of Cognitive Services resources you need to instead scope to Service tier: Azure OpenAI:
You can create budgets to manage costs and create alerts that automatically notify stakeholders of spending anomalies and overspending risks. Alerts are based on spending compared to budget and cost thresholds. Budgets and alerts are created for Azure subscriptions and resource groups, so they're useful as part of an overall cost monitoring strategy.
Budgets can be created with filters for specific resources or services in Azure if you want more granularity present in your monitoring. Filters help ensure that you don't accidentally create new resources that cost you additional money. For more information about the filter options available when you create a budget, see Group and filter options.
) Important
While OpenAI has an option for hard limits that will prevent you from going over your budget, Azure OpenAI does not currently provide this functionality. You are able to kick off automation from action groups as part of your budget notifications to take more advanced actions, but this requires additional custom development on your part.
You can also export your cost data to a storage account. This is helpful when you need or others to do additional data analysis for costs. For example, a finance team can analyze the data using Excel or Power BI. You can export your costs on a daily, weekly, or monthly schedule and set a custom date range. Exporting cost data is the recommended way to retrieve cost datasets.
Learn how to optimize your cloud investment with Azure Cost Management. Learn more about managing costs with cost analysis.
Learn about how to prevent unexpected costs.
Take the Cost Management guided learning course.
Article • 04/25/2023
This tutorial will walk you through using the Azure OpenAI embeddings API to perform document search where you'll query a knowledge base to find the most relevant document.
In this tutorial, you learn how to:
" Install Azure OpenAI and other dependent Python libraries.
" Download the BillSum dataset and prepare it for analysis.
" Create environment variables for your resources endpoint and API key.
" Use the text-embedding-ada-002 (Version 2) model
" Use cosine similarity to rank search results.
) Important
We strongly recommend using text-embedding-ada-002 (Version 2) . This model/version provides parity with OpenAI's text-embedding-ada-002 . To learn more about the improvements offered by this model, please refer to OpenAI's blog post . Even if you are currently using Version 1 you should migrate to Version 2 to take advantage of the latest weights/updated token limit. Version 1 and Version 2 are not interchangeable, so document embedding and document search must be done using the same version of the model.
An Azure subscription - Create one for free
Access granted to Azure OpenAI in the desired Azure subscription Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access . Open an issue on this repo to contact us if you have an issue.
The following Python libraries: openai, num2words, matplotlib, plotly, scipy, scikit-learn, pandas, tiktoken.
An Azure OpenAI resource with the text-embedding-ada-002 (Version 2) model deployed. This model is currently only available in certain regions. If you don't have a resource the process of creating one is documented in our resource deployment guide.
Python libraries
If you haven't already, you need to install the following libraries:
Windows Command Prompt |
pip install openai num2words matplotlib plotly scipy scikit-learn pandas tiktoken |
Download the BillSum dataset
BillSum is a dataset of United States Congressional and California state bills. For illustration purposes, we'll look only at the US bills. The corpus consists of bills from the 103rd-115th (1993-2018) sessions of Congress. The data was split into 18,949 train bills and 3,269 test bills. The BillSum corpus focuses on mid-length legislation from 5,000 to 20,000 characters in length. More information on the project and the original academic paper where this dataset is derived from can be found on the BillSum project's GitHub repository
This tutorial uses the sample data .
file that can be downloaded from our GitHub
You can also download the sample data by running the following command on your local machine:
Windows Command Prompt |
curl "https://raw.githubusercontent.com/Azure-Samples/Azure-OpenAI-Docs- Samples/main/Samples/Tutorials/Embeddings/data/bill_sum_data.csv" --output bill_sum_data.csv |
Retrieve key and endpoint
To successfully make a call against Azure OpenAI, you'll need an endpoint and a key.
Variable Value name |
ENDPOINT This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the value in Azure OpenAI Studio > Playground > Code View. An example endpoint is: https://docs-test-001.openai.azure.com . |
API-KEY This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2 . |
Go to your resource in the Azure portal. The Endpoint and Keys can be found in the
Resource Management section. Copy your endpoint and access key as you'll need both
KEY1
for authenticating your API calls. You can use either or KEY2 . Always having two
keys allows you to securely rotate and regenerate keys without causing a service disruption.
Create and assign persistent environment variables for your key and endpoint.
Environment variables
Command Line
setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"
CMD
setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"
CMD
print(dataframe_name)
dataframe_name
After setting the environment variables, you may need to close and reopen Jupyter notebooks or whatever IDE you're using in order for the environment variables to be accessible. While we strongly recommend using Jupyter Notebooks, if for some reason you cannot you'll need to modify any code that is returning a pandas dataframe by
using
rather than just calling the
directly as is
often done at the end of a code block.
Run the following code in your preferred Python IDE:
Python |
import openai import os import re import requests import sys from num2words import num2words import os import pandas as pd import numpy as np from openai.embeddings_utils import get_embedding, cosine_similarity import tiktoken API_KEY = os.getenv("AZURE_OPENAI_API_KEY") RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT") openai.api_type = "azure" openai.api_key = API_KEY openai.api_base = RESOURCE_ENDPOINT openai.api_version = "2022-12-01" url = openai.api_base + "/openai/deployments?api-version=2022-12-01" r = requests.get(url, headers={"api-key": API_KEY}) print(r.text) |
Output |
{ "data": [ { "scale_settings": { "scale_type": "standard" }, "model": "text-embedding-ada-002", "owner": "organization-owner", "id": "text-embedding-ada-002", "status": "succeeded", "created_at": 1657572678, "updated_at": 1657572678, "object": "deployment" }, { "scale_settings": { "scale_type": "standard" }, "model": "code-cushman-001", "owner": "organization-owner", "id": "code-cushman-001", "status": "succeeded", "created_at": 1657572712, |
"updated_at": 1657572712, "object": "deployment" }, { "scale_settings": { "scale_type": "standard" }, "model": "text-search-curie-doc-001", "owner": "organization-owner", "id": "text-search-curie-doc-001", "status": "succeeded", "created_at": 1668620345, "updated_at": 1668620345, "object": "deployment" }, { "scale_settings": { "scale_type": "standard" }, "model": "text-search-curie-query-001", "owner": "organization-owner", "id": "text-search-curie-query-001", "status": "succeeded", "created_at": 1669048765, "updated_at": 1669048765, "object": "deployment" } ], "object": "list" } |
The output of this command will vary based on the number and type of models you've deployed. In this case, we need to confirm that we have an entry for text-embedding-ada-002. If you find that you're missing this model, you'll need to deploy the model to your resource before proceeding.
Now we need to read our csv file and create a pandas DataFrame. After the initial DataFrame is created, we can view the contents of the table by running df .
Python |
df=pd.read_csv(os.path.join(os.getcwd(),'bill_sum_data.csv')) # This assumes that you have placed the bill_sum_data.csv in the same directory you are running Jupyter Notebooks df |
Output:
The initial table has more columns than we need we'll create a new smaller DataFrame
df_bills
called which will contain only the columns for text , summary , and title .
Python |
df_bills = df[['text', 'summary', 'title']] df_bills |
Output:
Next we'll perform some light data cleaning by removing redundant whitespace and cleaning up the punctuation to prepare the data for tokenization.
Python |
pd.options.mode.chained_assignment = None #https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#evaluation-order-matters # s is input text def normalize_text(s, sep_token = " \n "): s = re.sub(r'\s+', ' ', s).strip() s = re.sub(r". ,","",s) # remove all instances of multiple spaces s = s.replace("..",".") s = s.replace(". .",".") s = s.replace("\n", "") s = s.strip() return s df_bills['text']= df_bills["text"].apply(lambda x : normalize_text(x)) |
Now we need to remove any bills that are too long for the token limit (8192 tokens).
Python |
tokenizer = tiktoken.get_encoding("cl100k_base") df_bills['n_tokens'] = df_bills["text"].apply(lambda x: len(tokenizer.encode(x))) df_bills = df_bills[df_bills.n_tokens<8192] len(df_bills) |
Output |
20 |
7 Note
In this case all bills are under the embedding model input token limit, but you can use the technique above to remove entries that would otherwise cause embedding to fail. When faced with content that exceeds the embedding limit, you can also chunk the content into smaller pieces and then embed those one at a time.
We'll once again examine df_bills.
Python |
df_bills |
Output:
To understand the n_tokens column a little more as well how text ultimately is tokenized, it can be helpful to run the following code:
Python |
sample_encode = tokenizer.encode(df_bills.text[0]) decode = tokenizer.decode_tokens_bytes(sample_encode) decode |
For our docs we're intentionally truncating the output, but running this command in your environment will return the full text from index zero tokenized into chunks. You can see that in some cases an entire word is represented with a single token whereas in others parts of words are split across multiple tokens.
Output |
[b'SECTION', b' ', b'1', b'.', b' SHORT', b' TITLE', b'.', b' This', b' Act', b' may', b' be', b' cited', |
b' as', b' the', b' ``', b'National', b' Science', b' Education', b' Tax', b' In', b'cent', b'ive', b' for', b' Businesses', b' Act', b' of', b' ', b'200', b'7', b"''.", b' SEC', b'.', b' ', b'2', b'.', b' C', b'RED', b'ITS', b' FOR', b' CERT', b'AIN', b' CONTRIBUT', b'IONS', b' BEN', b'EF', b'IT', b'ING', b' SC', |
decode
If you then check the length of the number in the n_tokens column.
variable, you'll find it matches the first
Python |
len(decode) |
Output |
1466 |
Now that we understand more about how tokenization works we can move on to embedding. It is important to note, that we haven't actually tokenized the documents
n_tokens
yet. The column is simply a way of making sure none of the data we pass to
ada_v2
the model for tokenization and embedding exceeds the input token limit of 8,192. When we pass the documents to the embeddings model, it will break the documents into tokens similar (though not necessarily identical) to the examples above and then convert the tokens to a series of floating point numbers that will be accessible via vector search. These embeddings can be stored locally or in an Azure Database. As a result, each bill
will have its own corresponding embedding vector in the new right side of the DataFrame.
column on the
Python |
df_bills['ada_v2'] = df_bills["text"].apply(lambda x : get_embedding(x, engine = 'text-embedding-ada-002')) # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model |
Python |
df_bills |
Output:
As we run the search code block below, we'll embed the search query "Can I get
information on cable company tax revenue?" with the same text-embedding-ada-002 (Version 2) model. Next we'll find the closest bill embedding to the newly embedded text from our query ranked by cosine similarity.
Python |
# search through the reviews for a specific product def search_docs(df, user_query, top_n=3, to_print=True): |
embedding = get_embedding( user_query, engine="text-embedding-ada-002" # engine should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model ) df["similarities"] = df.ada_v2.apply(lambda x: cosine_similarity(x, embedding)) res = ( df.sort_values("similarities", ascending=False) .head(top_n) ) if to_print: display(res) return res res = search_docs(df_bills, "Can I get information on cable company tax revenue?", top_n=4) |
Output:
Finally, we'll show the top result from document search based on user query against the entire knowledge base. This returns the top result of the "Taxpayer's Right to View Act of 1993". This document has a cosine similarity score of 0.76 between the query and the document:
Python |
res["summary"][9] |
Output |
"Taxpayer's Right to View Act of 1993 - Amends the Communications Act of 1934 to prohibit a cable operator from assessing separate charges for any video programming of a sporting, theatrical, or other entertainment event if that event is performed at a facility constructed, renovated, or maintained with tax revenues or by an organization that receives public financial support. Authorizes the Federal Communications Commission and local franchising authorities to make determinations concerning the applicability of such prohibition. Sets forth conditions under which a facility is considered to have been constructed, maintained, or renovated with tax revenues. Considers events performed by nonprofit or public organizations that receive tax subsidies to be subject to this Act if the event is |
sponsored by, or includes the participation of a team that is part of, a tax exempt organization." |
Using this approach, you can use embeddings as a search mechanism across documents in a knowledge base. The user can then take the top search result and use it for their downstream task, which prompted their initial query.
If you created an OpenAI resource solely for completing this tutorial and want to clean up and remove an OpenAI resource, you'll need to delete your deployed models, and then delete the resource or associated resource group if it's dedicated to your test resource. Deleting the resource group also deletes any other resources associated with it.
Learn more about Azure OpenAI's models:
Azure OpenAI Service models
Article • 04/19/2023
Reference documentation | Package (NuGet) | Additional Samples on GitHub
) Important
In this how-to guide, you can use Azure Cognitive Services Speech to converse with Azure OpenAI Service. The text recognized by the Speech service is sent to Azure OpenAI. The text response from Azure OpenAI is then synthesized by the Speech service.
Speak into the microphone to start a conversation with Azure OpenAI.
The Speech service recognizes your speech and converts it into text (speech to text).
Your request as text is sent to Azure OpenAI.
The Speech service text to speech (TTS) feature synthesizes the response from Azure OpenAI to the default speaker.
Although the experience of this example is a back-and-forth exchange, Azure OpenAI doesn't remember the context of your conversation.
" Azure subscription - Create one for free
" Create a Speech resource in the Azure portal.
" Get the Speech resource key and region. After your Speech resource is deployed,
select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
The Speech SDK is available as a NuGet package and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the SDK installation guide
for any more requirements.
Set environment variables
This example requires environment variables named OPEN_AI_KEY , OPEN_AI_ENDPOINT , SPEECH_KEY , and SPEECH_REGION .
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you get a key for your Speech resource , write it to a new environment variable on the local machine running the application.
Tip
Don't include the key directly in your code, and never post it publicly. See the Cognitive Services security article for more authentication options like Azure Key Vault.
To set the environment variables, open a console window, and follow the instructions for your operating system and development environment.
OPEN_AI_KEY
To set the
environment variable, replace
with one of
your-openai-key
OPEN_AI_ENDPOINT
the keys for your resource. To set the
your-openai-endpoint
environment variable, replace
SPEECH_KEY
your-speech-key
with one of the regions for your resource.
To set the
environment variable, replace
with one of
SPEECH_REGION
your-speech-region
the keys for your resource.
To set the
environment variable, replace
with
one of the regions for your resource.
7 Note
Windows
setx OPEN_AI_KEY your-openai-key
setx OPEN_AI_ENDPOINT your-openai-endpoint setx SPEECH_KEY your-speech-key
setx SPEECH_REGION your-speech-region
Console
If you only need to access the environment variable in the current running console, you can set the environment variable with set instead of setx . After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example. |
Follow these steps to create a new console application.
Program.cs
Open a command prompt where you want the new project, and create a console
application with the .NET CLI. The directory.
file should be created in the project
.NET CLI |
dotnet new console |
Install the Speech SDK in your new project with the .NET CLI.
.NET CLI |
dotnet add package Microsoft.CognitiveServices.Speech |
Install the Azure OpenAI SDK (prerelease) in your new project with the .NET CLI.
.NET CLI |
dotnet add package Azure.AI.OpenAI --prerelease |
Program.cs
Replace the contents of with the following code.
C# |
using System; using System.IO; using System.Threading.Tasks; using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; using Azure; using Azure.AI.OpenAI; |
using static System.Environment; class Program { // This example requires environment variables named "OPEN_AI_KEY" and "OPEN_AI_ENDPOINT" // Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/ static string openAIKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY"); static string openAIEndpoint = Environment.GetEnvironmentVariable("OPEN_AI_ENDPOINT"); // Enter the deployment name you chose when you deployed the model. static string engine = "text-davinci-002"; // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" static string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY"); static string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION"); // Prompts Azure OpenAI with a request and synthesizes the response. async static Task AskOpenAI(string prompt) { // Ask Azure OpenAI OpenAIClient client = new(new Uri(openAIEndpoint), new AzureKeyCredential(openAIKey)); var completionsOptions = new CompletionsOptions() { Prompts = { prompt }, MaxTokens = 100, }; Response<Completions> completionsResponse = client.GetCompletions(engine, completionsOptions); string text = completionsResponse.Value.Choices[0].Text.Trim(); Console.WriteLine($"Azure OpenAI response: {text}"); var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); // The language of the voice that speaks. speechConfig.SpeechSynthesisVoiceName = "en-US-JennyMultilingualNeural"; var audioOutputConfig = AudioConfig.FromDefaultSpeakerOutput(); using (var speechSynthesizer = new SpeechSynthesizer(speechConfig, audioOutputConfig)) { var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(text).ConfigureAwait(true); if (speechSynthesisResult.Reason == ResultReason.SynthesizingAudioCompleted) |
{ Console.WriteLine($"Speech synthesized to speaker for text: [{text}]"); } else if (speechSynthesisResult.Reason == ResultReason.Canceled) { var cancellationDetails = SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult); Console.WriteLine($"Speech synthesis canceled: {cancellationDetails.Reason}"); if (cancellationDetails.Reason == CancellationReason.Error) { Console.WriteLine($"Error details: {cancellationDetails.ErrorDetails}"); } } } } // Continuously listens for speech input to recognize and send as text to Azure OpenAI async static Task ChatWithOpenAI() { // Should be the locale for the speaker's language. var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); speechConfig.SpeechRecognitionLanguage = "en-US"; using var audioConfig = AudioConfig.FromDefaultMicrophoneInput(); using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig); var conversationEnded = false; while(!conversationEnded) { Console.WriteLine("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation."); // Get audio from the microphone and then send it to the TTS service. var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync(); switch (speechRecognitionResult.Reason) { case ResultReason.RecognizedSpeech: if (speechRecognitionResult.Text == "Stop.") { Console.WriteLine("Conversation ended."); conversationEnded = true; } |
else { Console.WriteLine($"Recognized speech: {speechRecognitionResult.Text}"); await AskOpenAI(speechRecognitionResult.Text).ConfigureAwait(true); } break; case ResultReason.NoMatch: Console.WriteLine($"No speech could be recognized: "); break; case ResultReason.Canceled: var cancellationDetails = CancellationDetails.FromResult(speechRecognitionResult); Console.WriteLine($"Speech Recognition canceled: {cancellationDetails.Reason}"); if (cancellationDetails.Reason == CancellationReason.Error) { Console.WriteLine($"Error details= {cancellationDetails.ErrorDetails}"); } break; } } } async static Task Main(string[] args) { try { await ChatWithOpenAI().ConfigureAwait(true); } catch (Exception ex) { Console.WriteLine(ex.Message); } } } |
MaxTokens
CompletionsOptions
To increase or decrease the number of tokens returned by Azure OpenAI, change
the
property in the
class instance. For more
information tokens and cost implications, see Azure OpenAI tokens and Azure OpenAI pricing .
Run your new console application to start speech recognition from a microphone:
Console |
dotnet run |
) Important
Make sure that you set the OPEN_AI_KEY , OPEN_AI_ENDPOINT , SPEECH KEY and SPEECH REGION environment variables as described previously. If you don't set these variables, the sample will fail with an error message.
Speak into your microphone when prompted. The console output includes the prompt for you to begin speaking, then your request as text, and then the response from Azure OpenAI as text. The response from Azure OpenAI should be converted from text to speech and then output to the default speaker.
Console |
PS C:\dev\openai\csharp> dotnet run Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Recognized speech:Make a comma separated list of all continents. Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America] Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses. Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica) Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)] Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Conversation ended. PS C:\dev\openai\csharp> |
Now that you've completed the quickstart, here are some more considerations:
To change the speech recognition language, replace with another supported
language. For example, for Spanish (Spain). The default language is if
you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see language identification.
en-US-JennyMultilingualNeural
To change the voice that you hear, replace with
another supported voice. If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
text-davinci-002
To use a different model, replace with the ID of another
deployment. Keep in mind that the deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in Azure OpenAI Studio .
Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses may be filtered if harmful content is detected. For more information, see the content filtering article.
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.
Learn more about Azure OpenAI
Article • 05/19/2023
In addition to the Transparency Note, we have created technical recommendations and resources to help customers design, develop, deploy, and use AI systems that implement the Azure OpenAI models responsibly. Our recommendations are grounded in the Microsoft Responsible AI Standard , which sets policy requirements that our own engineering teams follow. Much of the content of the Standard follows a pattern, asking teams to Identify, Measure, and Mitigate potential harms, and plan for how to Operate the AI system as well. In alignment with those practices, these recommendations are organized into four stages:
Identify : Identify and prioritize potential harms that could result from your AI system through iterative red-teaming, stress-testing, and analysis.
Measure : Measure the frequency and severity of those harms by establishing clear metrics, creating measurement test sets, and completing iterative, systematic testing (both manual and automated).
Mitigate : Mitigate harms by implementing tools and strategies such as prompt engineering and using our content filters. Repeat measurement to test effectiveness after implementing mitigations.
Operate : Define and execute a deployment and operational readiness plan.
In addition to their correspondence to the Microsoft Responsible AI Standard, these stages correspond closely to the functions in the NIST AI Risk Management Framework .
Identifying potential harms that could occur in or be caused by an AI system is the first stage of the Responsible AI lifecycle. The earlier you begin to identify potential harms, the more effective you can be at mitigating the harms. When assessing potential harms, it is important to develop an understanding of the types of harms that could result from
using the Azure OpenAI Service in your specific context(s). In this section, we provide recommendations and resources you can use to identify harms through an impact assessment, iterative red team testing, stress-testing, and analysis. Red teaming and stress-testing are approaches where a group of testers come together and intentionally probe a system to identify its limitations, risk surface, and vulnerabilities.
These steps have the goal of producing a prioritized list of potential harms for each specific scenario.
Identify harms that are relevant for your specific model, application, and deployment scenario.
Identify potential harms associated with the model and model capabilities (for example, GPT-3 model vs GPT-4 model) that you're using in your system. This is important to consider because each model has different capabilities, limitations, and risks, as described more fully in the sections above.
Identify any other harms or increased scope of harm presented by the intended use of the system you're developing. Consider using a Responsible AI Impact Assessment to identify potential harms.
For example, let's consider an AI system that summarizes text. Some uses of text generation are lower risk than others. For example, if the system is to be used in a healthcare domain for summarizing doctor's notes, the risk of harm arising from inaccuracies is higher than if the system is summarizing online articles.
Prioritize harms based on elements of risk such as frequency and severity. Assess the level of risk for each harm and the likelihood of each risk occurring in order to prioritize the list of harms you've identified. Consider working with subject matter experts and risk managers within your organization and with relevant external stakeholders when appropriate.
Conduct red team testing and stress testing starting with the highest priority harms, to develop a better understanding of whether and how the identified harms are actually occurring in your scenario, as well as to identify new harms you didn't initially anticipate.
Share this information with relevant stakeholders using your organization's internal compliance processes.
At the end of this Identify stage, you should have a documented, prioritized list of harms. When new harms and new instances of harms emerge through further testing and use of the system, you can update and improve this list by following the above process again.
Once a list of prioritized harms has been identified, the next stage involves developing an approach for systematic measurement of each harm and conducting evaluations of the AI system. There are manual and automated approaches to measurement. We recommend you do both, starting with manual measurement.
Manual measurement is useful for:
Measuring progress on a small set of priority issues. When mitigating specific harms, it's often most productive to keep manually checking progress against a small dataset until the harm is no longer observed before moving to automated measurement.
Defining and reporting metrics until automated measurement is reliable enough to
use alone.
Spot-checking periodically to measure the quality of automatic measurement.
Automated measurement is useful for:
Measuring at a large scale with increased coverage to provide more comprehensive results.
Ongoing measurement to monitor for any regression as the system, usage, and mitigations evolve.
Below, we provide specific recommendations to measure your AI system for potential harms. We recommend you first complete this process manually and then develop a plan to automate the process:
Create inputs that are likely to produce each prioritized harm: Create measurement set(s) by generating many diverse examples of targeted inputs that are likely to produce each prioritized harm.
Generate System Outputs: Pass in the examples from the measurement sets as inputs to the system to generate system outputs. Document the outputs.
Evaluate System Outputs and Report Results to Relevant Stakeholders
Define clear metric(s). For each intended use of your system, establish metrics that measure the frequency and degree of severity of each potentially harmful output. Create clear definitions to classify outputs that will be considered harmful or problematic in the context of your system and scenario, for each type of prioritized harm you identified.
Assess the outputs against the clear metric definitions and record and quantify the occurrences of harmful outputs. Repeat the measurements periodically, to assess mitigations and monitor for any regression.
Share this information with relevant stakeholders using your organization's internal compliance processes.
At the end of this measurement stage, you should have a defined measurement approach to benchmark how your system performs for each potential harm as well as an initial set of documented results. As you continue implementing and testing mitigations, the metrics and measurement sets should continue to be refined (for example, to add metrics for new harms that were initially unanticipated) and the results updated.
Mitigating harms presented by large language models such as the Azure OpenAI models requires an iterative, layered approach that includes experimentation and continual measurement. We recommend developing a mitigation plan that encompasses four layers of mitigations for the harms identified in the earlier stages of this process:
At the model level, it's important to understand the model(s) you'll be using and what fine-tuning steps may have been taken by the model developers to align the model towards its intended uses and to reduce the risk of potentially harmful uses and outcomes.
For example, for GPT-4, model developers have been able to use reinforcement learning methods as a responsible AI tool to better align the model towards the designers' intended goals.
At the safety system level, you should understand the platform level mitigations that have been implemented. such as the Azure OpenAI content filters which help
to block the output of harmful content.
At the application level, application developers can implement metaprompt and user-centered design and user experience mitigations. Metaprompts are instructions provided to the model to guide its behavior; their use can make a critical difference in guiding the system to behave in accordance with your expectations. User-centered design and user experience (UX) interventions are also key mitigation tools to prevent misuse and overreliance on AI.
At the positioning level, there are many ways to educate the people who will use or be affected by your system about its capabilities and limitations.
Below, we provide specific recommendations to implement mitigations at the different layers. Not all of these mitigations are appropriate for every scenario, and conversely, these mitigations may be insufficient for some scenarios. Give careful consideration to your scenario and the prioritized harms you identified, and as you implement mitigations, develop a process to measure and document their effectiveness for your system and scenario.
Model level Mitigations: Review and identify which Azure OpenAI base model is best suited for the system you're building and educate yourself about its capabilities, limitations, and any measures taken to reduce the risk of the potential harms you've identified. For example, if you're using GPT-4, in addition to reading this Transparency Note, you can review OpenAI's GPT-4 System Card explaining the safety challenges presented by the model and the safety processes that OpenAI adopted to prepare GPT-4 for deployment. It may be worth experimenting with different versions of the model(s) (including through red teaming and measuring) to see how the harms present differently.
Safety System Level Mitigations: Identify and evaluate the effectiveness of platform level solutions such as the Azure OpenAI content filters to help mitigate the potential harms that you have identified.
Application Level Mitigations: Prompt engineering, including metaprompt tuning, can be an effective mitigation for many different types of harm. Review and implement metaprompt (also called the "system message" or "system prompt") guidance and best practices documented here.
We recommend implementing the following user-centered design and user experience (UX) interventions, guidance, and best practices to guide users to use the system as intended and to prevent overreliance on the AI system:
Review and edit interventions: Design the user experience (UX) to encourage people who use the system to review and edit the AI-generated outputs before accepting them (see HAX G9 : Support efficient correction).
Highlight potential inaccuracies in the AI-generated outputs (see HAX G2 : Make clear how well the system can do what it can do), both when users first
start using the system and at appropriate times during ongoing use. In the first run experience (FRE), notify users that AI-generated outputs may contain inaccuracies and that they should verify information. Throughout the experience, include reminders to check AI-generated output for potential inaccuracies, both overall and in relation to specific types of content the system may generate incorrectly. For example, if your measurement process has determined that your system has lower accuracy with numbers, mark numbers in generated outputs to alert the user and encourage them to check the numbers or seek external sources for verification.
User responsibility. Remind people that they are accountable for the final content when they're reviewing AI-generated content. For example, when offering code suggestions, remind the developer to review and test suggestions before accepting.
Disclose AI's role in the interaction. Make people aware that they are interacting with an AI system (as opposed to another human). Where appropriate, inform content consumers that content has been partly or fully generated by an AI model; such notices may be required by law or applicable best practices, and can reduce inappropriate reliance on AI-generated outputs and can help consumers use their own judgment about how to interpret and act on such content.
Prevent the system from anthropomorphizing. AI models may output content containing opinions, emotive statements, or other formulations that could imply that they're human-like, that could be mistaken for a human identity, or that could mislead people to think that a system has certain capabilities when it doesn't. Implement mechanisms that reduce the risk of such outputs or incorporate disclosures to help prevent misinterpretation of outputs.
Cite references and information sources. If your system generates content based on references sent to the model, clearly citing information sources helps people understand where the AI-generated content is coming from.
Limit the length of inputs and outputs, where appropriate. Restricting input and output length can reduce the likelihood of producing undesirable content, misuse of the system beyond its intended uses, or other harmful or unintended uses.
Structure inputs and/or system outputs. Use prompt engineering techniques within your application to structure inputs to the system to prevent open-ended responses. You can also limit outputs to be structured in certain formats or patterns. For example, if your system generates dialog for a fictional character in
response to queries, limit the inputs so that people can only query for a predetermined set of concepts.
Prepare pre-determined responses. There are certain queries to which a model may generate offensive, inappropriate, or otherwise harmful responses. When harmful or offensive queries or responses are detected, you can design your system to deliver a predetermined response to the user. Predetermined responses should be crafted thoughtfully. For example, the application can provide prewritten answers to questions such as "who/what are you?" to avoid having the system respond with anthropomorphized responses. You can also use predetermined responses for questions like, "What are your terms of use?" to direct people to the correct policy.
Restrict automatic posting on social media. Limit how people can automate your product or service. For example, you may choose to prohibit automated posting of AI-generated content to external sites (including social media), or to prohibit the automated execution of generated code.
Bot detection. Devise and implement a mechanism to prohibit users from building an API on top of your product.
Positioning Level Mitigations:
Be appropriately transparent. It's important to provide the right level of transparency to people who use the system, so that they can make informed decisions around the use of the system.
Provide system documentation. Produce and provide educational materials for your system, including explanations of its capabilities and limitations. For example, this could be in the form of a "learn more" page accessible via the system.
Publish user guidelines and best practices. Help users and stakeholders use the system appropriately by publishing best practices, for example on prompt crafting, reviewing generations before accepting them, etc. Such guidelines can help people understand how the system works. When possible, incorporate the guidelines and best practices directly into the UX.
As you implement mitigations to address potential identified harms, it's important to develop a process for ongoing measurement of the effectiveness of such mitigations, to document measurement results, and to review those measurement results to continually improve the system.
Once measurement and mitigation systems are in place, we recommend that you define and execute a deployment and operational readiness plan. This stage includes
completing appropriate reviews of your system and mitigation plans with relevant stakeholders, establishing pipelines to collect telemetry and feedback, and developing an incident response and rollback plan.
Some recommendations for how to deploy and operate a system that uses the Azure OpenAI service with appropriate, targeted harms mitigations include:
Work with compliance teams within your organization to understand what types of reviews are required for your system and when they are required (for example, legal review, privacy review, security review, accessibility review, etc.).
Develop and implement the following:
Develop a phased delivery plan. We recommend you launch systems using the Azure OpenAI service gradually using a "phased delivery" approach. This gives a limited set of people the opportunity to try the system, provide feedback, report issues and concerns, and suggest improvements before the system is released more widely. It also helps to manage the risk of unanticipated failure modes, unexpected system behaviors, and unexpected concerns being reported.
Develop an incident response plan. Develop an incident response plan and evaluate the time needed to respond to an incident.
Develop a rollback plan Ensure you can roll back the system quickly and efficiently in case an unanticipated incident occurs.
Prepare for immediate action for unanticipated harms. Build the necessary features and processes to block problematic prompts and responses as they're discovered and as close to real-time as possible. When unanticipated harms do occur, block the problematic prompts and responses as quickly as possible, develop and deploy appropriate mitigations, investigate the incident, and implement a long-term solution.
Develop a mechanism to block people who are misusing your system. Develop a mechanism to identify users who violate your content policies (for example, by generating hate speech) or are otherwise using your system for unintended or harmful purposes, and take action against further abuse. For example, if a user frequently uses your system to generate content that is blocked or flagged by content safety systems, consider blocking them from further use of your system. Implement an appeal mechanism where appropriate.
Build effective user feedback channels. Implement feedback channels through which stakeholders (and the general public, if applicable) can submit feedback or report issues with generated content or that otherwise arise during their use of the system. Document how such feedback is processed, considered, and addressed. Evaluate the feedback and work to improve the system based on user feedback. One approach could be to include buttons with generated content that would allow users to identify content as "inaccurate," "harmful" or
"incomplete." This could provide a more widely used, structured and feedback signal for analysis.
Telemetry data. Identify and record (consistent with applicable privacy laws, policies, and commitments) signals that indicate user satisfaction or their ability to use the system as intended. Use telemetry data to identify gaps and improve the system.
This document is not intended to be, and should not be construed as providing, legal advice. The jurisdiction in which you're operating may have various regulatory or legal requirements that apply to your AI system. Consult a legal specialist if you are uncertain about laws or regulations that might apply to your system, especially if you think those might impact these recommendations. Be aware that not all of these recommendations and resources are appropriate for every scenario, and conversely, these recommendations and resources may be insufficient for some scenarios.
Microsoft responsible AI resources
Microsoft Azure Learning courses on responsible AI
Limited access to Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn
Code of Conduct for the Azure OpenAI Service | Microsoft Learn
Data, privacy, and security for Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn
Article • 05/19/2023
An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.
Microsoft’s Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the Microsoft's AI principles .
Azure OpenAI provides customers with a fully managed AI service that lets developers and data scientists apply OpenAI's powerful models including models that can generate natural language, code, and images. Within the Azure OpenAI Service, the OpenAI models are integrated with Microsoft-developed content filtering and abuse detection models. Learn more about content filtering here and abuse detection here.
Select the tabs to see content for the relevant model type.
Introduction
As part of the fully managed Azure OpenAI Service, the GPT-3 models analyze and generate natural language, Codex models analyze and generate code and plain text code commentary, and the GPT-4 models can understand and generate natural language and code. These models use an autoregressive architecture, meaning they
Text and code models
use data from prior observations to predict the most probable next word. This process is then repeated by appending the newly generated content to the original text to produce the complete generated response. Because the response is conditioned on the input text, these models can be applied to various tasks simply by changing the input text.
The GPT-3 series of models are pretrained on a wide body of publicly available free text data. This data is sourced from a combination of web crawling (specifically, a filtered version of Common Crawl , which includes a broad range of text from the internet and comprises 60 percent of the weighted pretraining dataset) and higher-quality datasets, including an expanded version of the WebText dataset, two internet-based books corpora and English-language Wikipedia. The GPT-4 base model was trained using publicly available data (such as internet data) and data that was licensed by OpenAI. The model was fine-tuned using reinforcement learning with human feedback (RLHF).
Learn more about the training and modeling techniques in OpenAI's GPT-3 , GPT-
4 , and Codex research papers. The guidance below is also drawn from OpenAI's safety best practices .
Key terms
Term | Definition |
Prompt | The text you send to the service in the API call. This text is then input into the |
model. For example, one might input the following prompt: | |
Convert the questions to a command: | |
Q: Ask Constance if we need some bread | |
A: send-msg 'find constance' Do we need some bread? | |
Q: Send a message to Greg to figure out if things are ready for | |
Wednesday. | |
A: | |
Completion The text Azure OpenAI outputs in response. For example, the service may or respond with the following answer to the above prompt: send-msg 'find Generation greg' figure out if things are ready for Wednesday. | |
Token | Azure OpenAI processes text by breaking it down into tokens. Tokens can be words or just chunks of characters. For example, the word hamburger gets broken up into the tokens ham , bur and ger , while a short and common word like pear is a single token. Many tokens start with a whitespace, for example hello and bye . |
Text and code models
The GPT-4, GPT-3, and Codex Azure OpenAI Service models use natural language instructions and examples in the prompt to identify the task. The model then completes the task by predicting the most probable next text. This technique is known as "in-context" learning. These models are not retrained during this step but instead give predictions based on the context you include in the prompt.
There are three main approaches for in-context learning. These approaches vary based on the amount of task-specific data that is given to the model:
Few-shot : In this case, a user includes several examples in the prompt that demonstrate the expected answer format and content. The following example shows a few-shot prompt providing multiple examples:
Convert the questions to a command:
Q: Ask Constance if we need some bread
A: send-msg `find constance` Do we need some bread?
Q: Send a message to Greg to figure out if things areready for Wednesday.
A: send-msg `find greg` Is everything ready forWednesday? Q: Ask Ilya if we're still having our meeting thisevening
A: send-msg `find ilya` Are we still having a meetingthis evening?
Q: Contact the ski store and figure out if I can getmy skis fixed before I leave on Thursday
A: send-msg `find ski store` Would it be possible toget my skis fixed before I leave on Thursday?
Q: Thank Nicolas for lunch
A: send-msg `find nicolas` Thank you for lunch!
Q: Tell Constance that I won't be home before 19:30tonight — unmovable meeting.
A: send-msg `find constance` I won't be home before19:30 tonight. I have a meeting I can't move.
Q: Tell John that I need to book an appointment at10:30 A:
The number of examples typically ranges from 0 to 100 depending on how many can fit in the maximum input length for a single prompt. Few-shot learning enables a major reduction in the amount of task-specific data required for accurate predictions.
One-shot : This case is the same as the few-shot approach except only one example is provided. The following example shows a one-shot prompt:
Convert the questions to a command: Q: Ask Constance if we need some bread A: send-msg `find constance` Do we need some bread? Q: Send a message to Greg to figure out if things are ready for Wednesday. A: Zero-shot: In this case, no examples are provided to the model and only the task request is provided. The following example shows a zero-shot prompt: Convert the question to a command: Q: Ask Constance if we need some bread A: |
Text and code models
Intended uses
The GPT-4, GPT-3, and Codex models in the Azure OpenAI service can be used in multiple scenarios. The following list isn't comprehensive, but it illustrates the diversity of tasks that can be supported with appropriate mitigations:
Chat and conversation interaction : Users can interact with a conversational agent that responds with responses drawn from trusted documents such as internal company documentation or tech support documentation.
Conversations must be limited to answering scoped questions.
Chat and conversation creation : Users can create a conversational agent that responds with responses drawn from trusted documents such as internal company documentation or tech support documentation. Conversations must be limited to answering scoped questions.
Code generation or transformation scenarios : For example, converting one programming language to another, generating docstrings for functions, converting natural language to SQL.
Journalistic content : For use to create new journalistic content or to rewrite journalistic content submitted by the user as a writing aid for predefined topics. Users cannot use the application as a general content creation tool for all topics. May not be used to generate content for political campaigns. Question-answering : Users can ask questions and receive answers from trusted source documents such as internal company documentation. The application does not generate answers ungrounded in trusted source documentation. Reason over structured and unstructured data : Users can analyze inputs using classification, sentiment analysis of text, or entity extraction. Examples include analyzing product feedback sentiment, analyzing support calls and transcripts, and refining text-based search with embeddings. Search : Users can search trusted source documents such as internal company documentation. The application does not generate results ungrounded in trusted source documentation. Summarization : Users can submit content to be summarized for predefined topics built into the application and cannot use the application as an open-ended summarizer. Examples include summarization of internal company documentation, call center transcripts, technical reports, and product reviews. Writing assistance on specific topics : Users can create new content or rewrite content submitted by the user as a writing aid for business content or predefined topics. Users can only rewrite or create content for specific business purposes or predefined topics and cannot use the application as a general content creation tool for all topics. Examples of business content include proposals and reports. For journalistic use, see above Journalistic content use case. Considerations when choosing a use case We encourage customers to use the Azure OpenAI GPT-4, GPT-3, and Codex models in their innovative solutions or applications as approved in their Limited Access registration form. However, here are some considerations when choosing a use case: Not suitable for open-ended, unconstrained content generation. Scenarios where users can generate content on any topic are more likely to produce offensive or harmful text. The same is true of longer generations. Not suitable for scenarios where up-to-date, factually accurate information is crucial unless you have human reviewers or are using the models to search your own documents and have verified suitability for your scenario. The service does not have information about events that occur after its training |
date, likely has missing knowledge about some topics, and may not always produce factually accurate information. Avoid scenarios where use or misuse of the system could result in significant physical or psychological injury to an individual. For example, scenarios that diagnose patients or prescribe medications have the potential to cause significant harm. Avoid scenarios where use or misuse of the system could have a consequential impact on life opportunities or legal status. Examples include scenarios where the AI system could affect an individual's legal status, legal rights, or their access to credit, education, employment, healthcare, housing, insurance, social welfare benefits, services, opportunities, or the terms on which they're provided. Avoid high stakes scenarios that could lead to harm. The models hosted by Azure OpenAI service reflect certain societal views, biases, and other undesirable content present in the training data or the examples provided in the prompt. As a result, we caution against using the models in high-stakes scenarios where unfair, unreliable, or offensive behavior might be extremely costly or lead to harm. Carefully consider use cases in high stakes domains or industry: Examples include but are not limited to healthcare, medicine, finance, or legal. Carefully consider well-scoped chatbot scenarios. Limiting the use of the service in chatbots to a narrow domain reduces the risk of generating unintended or undesirable responses. Carefully consider all generative use cases. Content generation scenarios may be more likely to produce unintended outputs and these scenarios require careful consideration and mitigations. |
When it comes to large-scale natural language models and image models, there are particular fairness and responsible AI issues to consider. People use language and images to describe the world and to express their beliefs, assumptions, attitudes, and values. As a result, publicly available text and image data typically used to train large-scale natural language processing and image generation models contains societal biases relating to race, gender, religion, age, and other groups of people, as well as other undesirable content. These societal biases are reflected in the distributions of words, phrases, and syntactic structures.
Technical limitations, operational factors and ranges
U Caution
Please be advised that this section contains illustrative examples which include terms and language that some individuals may find offensive.
Large-scale natural language and image models trained with such data can potentially behave in ways that are unfair, unreliable, or offensive, in turn causing harms. Some of the ways are listed here. We emphasize that these types of harms aren't mutually exclusive. A single model can exhibit more than one type of harm, potentially relating to multiple different groups of people. For example:
Allocation: These models can be used in ways that lead to unfair allocation of resources or opportunities. For example, automated resume screening systems can withhold employment opportunities from one gender if they're trained on resume data that reflects the existing gender imbalance in a particular industry. Or the DALL·E 2 model could be used to create imagery in the style of a known artist, which could affect the value of the artist's work or the artist's life opportunities.
Quality of service: The Azure OpenAI models are trained primarily on English text and images with English text descriptions. Languages other than English will experience worse performance. English language varieties with less representation in the training data might experience worse performance. The publicly available images used to train the DALL·E models might reinforce public bias and other undesirable content. The models are also unable to generate consistent photorealistic images and comprehensive text at this time.
Stereotyping: These models can reinforce stereotypes. For example, when
translating "He is a nurse" and "She is a doctor" into a genderless language such as Turkish and then back into English, many machine translation systems yield the stereotypical (and incorrect) results of "She is a nurse" and "He is a doctor." With DALL·E 2, when generating an image based on the prompt "Fatherless children," the model could generate images of Black children only, reinforcing harmful stereotypes that may exist in publicly available images.
Demeaning: These models can demean people. For example, an open-ended content generation system with inappropriate or insufficient mitigations might produce offensive or demeaning to a particular group of people.
Overrepresentation and underrepresentation: These models can over- or under-represent groups of people, or even erase their representation entirely. For example, if text prompts that contain the word "gay" are detected as potentially harmful or offensive, this could to the underrepresentation or even erasure of legitimate image generations by or about the LGBTQIA+ community.
Inappropriate or offensive content: These models can produce other types of inappropriate or offensive content. Examples include the ability to create images that potentially contain harmful artifacts such as hate symbols; images that illicit harmful connotations; images that relate to contested, controversial, or ideologically polarizing topics; images that are manipulative; images that contain sexually charged content that isn't caught by sexual-related content filters; and images that relate to sensitive or emotionally charged topics. For example, a well-intentioned text prompt aimed to create an image of the New York skyline with clouds and airplanes flying over it might unintentionally generate images that illicit sentiments related to the events surrounding 9/11.
Disinformation and misinformation about sensitive topics: Because DALL·E 2 is a powerful image generation model, it can be used to produce disinformation and misinformation that can be extremely harmful. For example, the model might generate an image of a political leader engaging in activity of a violent or sexual (or simply inaccurate) nature that might lead to defamation and other consequential harms, including but not limited to public protests, political change, or fake news.
Information reliability: Language model responses can fabricate content that may
sound reasonable but is nonsensical or inaccurate with respect to external validation sources. Even when drawing responses from trusted source information, responses may misrepresent that content.
False information: Azure OpenAI doesn't fact-check or verify content that is provided by customers or users. Depending on how you've developed your application, it might produce false information unless you've built in mitigations (see Best practices for improving system performance below).
System performance
In many AI systems, performance is often defined in relation to accuracy—that is, how often the AI system offers a correct prediction or output. With large-scale natural language models and image models, two different users might look at the same output and have different opinions of how useful or relevant it is, which means that performance for these systems must be defined more flexibly. Here, we broadly consider performance to mean that the application performs as you and your users expect, including not generating harmful outputs.
Azure OpenAI service can support a wide range of applications like search, classification, code generation, and image generation, each with different performance metrics and mitigation strategies. There are several steps you can take to mitigate some of the concerns listed under "Limitations" and to improve performance. Other important
mitigation techniques are outlined in the section Evaluating and integrating Azure OpenAI for your use below.
Best practices for improving system performance
Show and tell when designing prompts. With text and code models, make it clear to the model what kind of outputs you expect through instructions, examples, or a combination of the two. If you want the model to rank a list of items in alphabetical order or to classify a paragraph by sentiment, show it that's what you want.
Keep your application on topic. Carefully structure prompts and image inputs to reduce the chance of producing undesired content, even if a user tries to use it for this purpose. For instance, you might indicate in your prompt that a chatbot only engages in conversations about mathematics and otherwise responds "I'm sorry. I'm afraid I can't answer that." Adding adjectives like "polite" and examples in your desired tone to your prompt can also help steer outputs. With image models, you might indicate in your prompt or image input that your application generates only conceptual images. It might otherwise generate a pop-up notification that explains that the application is not for photorealistic use or to portray reality. Consider nudging users toward acceptable queries and image inputs, either by listing such examples up front or by offering them as suggestions upon receiving an off-topic request. Consider training a classifier to determine whether an input (prompt or image) is on topic or off topic.
Provide quality data. With text and code models, if you're trying to build a classifier or get the model to follow a pattern, make sure that there are enough examples. Be sure to proofread your examples—the model is usually smart enough to see through basic spelling mistakes and give you a response, but it also might assume this is intentional and it could affect the response. Providing quality data also includes giving your model reliable data to draw responses from in chat and question answering systems.
Measure model quality. As part of general model quality, consider measuring and improving fairness-related metrics and other metrics related to responsible AI in addition to traditional accuracy measures for your scenario. Consider resources like this checklist when you measure the fairness of the system. These measurements come with limitations, which you should acknowledge and communicate to stakeholders along with evaluation results.
Limit the length, structure, and rate of inputs and outputs. Restricting the length or structure of inputs and outputs can increase the likelihood that the application will stay on task and mitigate, at least in part, any potentially unfair, unreliable, or offensive behavior. Other options to reduce the risk of misuse include (i) restricting
the source of inputs (for example, limiting inputs to a particular domain or to authenticated users rather than being open to anyone on the internet) and (ii) implementing usage rate limits.
Encourage human review of outputs prior to publication or dissemination. With generative AI, there is potential for generating content that might be offensive or not related to the task at hand, even with mitigations in place. To ensure that the generated output meets the task of the customer, consider building ways to remind customers to review their outputs for quality prior to sharing widely. This can reduce many different harms, including offensive material, disinformation, and more.
Implement additional scenario-specific mitigations. Refer to the mitigations outlined in Evaluating and integrating Azure OpenAI for your use including content moderation strategies. These do not represent every mitigation that might be required for your application, but they point to the general minimum baseline we check for when approving use cases for Azure OpenAI Service.
For additional information on how to evaluate and integrate these models responsibly, please see the RAI Overview document.
Microsoft responsible AI resources
Microsoft Azure Learning courses on responsible AI
Limited access to Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn
Code of Conduct for the Azure OpenAI Service | Microsoft Learn
Data, privacy, and security for Azure OpenAI Service - Azure Cognitive Services | Microsoft Learn
Article • 04/28/2023
As part of Microsoft's commitment to responsible AI, we are designing and releasing Azure OpenAI Service with the intention of protecting the rights of individuals and society and fostering transparent human-computer interaction. For this reason, we currently limit the access and use of Azure OpenAI, including limiting access to the ability to modify content filters and/or abuse monitoring.
Azure OpenAI requires registration and is currently only available to approved enterprise customers and partners. Customers who wish to use Azure OpenAI are required to submit a registration form .
Customers must attest to any and all use cases for which they will use the service (the use cases from which customers may select will populate in the form after selection of the desired model(s) in Question 22 in the initial registration form). Customers who wish to add additional use cases after initial onboarding must submit the additional use cases using this form . The use of Azure OpenAI is limited to use cases that have been selected in a registration form. Microsoft may require customers to re-verify this information. Read more about example use cases and use cases to avoid here.
Access to the Azure OpenAI Service is subject to Microsoft's sole discretion based on eligibility criteria and a vetting process, and customers must acknowledge that they have read and understand the Azure terms of service for Azure OpenAI Service.
Azure OpenAI Service is made available to customers under the terms governing their subscription to Microsoft Azure Services, including the Azure OpenAI section of the Microsoft Product Terms . Please review these terms carefully as they contain important conditions and obligations governing your use of Azure OpenAI Service.
Important links
Add additional use cases (if needed)
Register to modify content filters and abuse monitoring (if needed)
FAQ about Limited Access can be found here. If you need help with Azure OpenAI, find support here. Report abuse of Azure OpenAI here .
Report problematic content to cscraireport@microsoft.com.
Code of conduct for Azure OpenAI Service integrations Transparency note for Azure OpenAI Service Characteristics and limitations for Azure OpenAI Service Data, privacy, and security for Azure OpenAI Service
Article • 03/13/2023
The following Code of Conduct defines the requirements that all Azure OpenAI Service implementations must adhere to in good faith. This code of conduct is in addition to the Acceptable Use Policy in the Microsoft Online Services Terms .
Azure OpenAI Service is a Limited Access service that requires registration and is only available to approved enterprise customers and partners. Customers who wish to use this service are required to register through this form . To learn more, see Limited Access to Azure OpenAI Service.
Integrations with Azure OpenAI Service must: Implement meaningful human oversight
Implement strong technical limits on inputs and outputs to reduce the likelihood of misuse beyond the application's intended purpose
Test applications thoroughly to find and mitigate undesirable behaviors Establish feedback channels
Implement additional scenario-specific mitigations To learn more, see the Azure OpenAI transparency note.
be used in any way that violates Microsoft’s Acceptable Use Policy , including but not limited to any use prohibited by law, regulation, government order, or decree, or any use that violates the rights of others;
be used in any way that is inconsistent with this code of conduct, including the Limited Access requirements, the Responsible AI mitigation requirements, and the Content requirements;
exceed the use case(s) you identified to Microsoft in connection with your request to use the service;
interact with individuals under the age of consent in any way that could result in exploitation or manipulation or is otherwise prohibited by law or regulation; generate or interact with content prohibited in this Code of Conduct;
be presented alongside or monetize content prohibited in this Code of Conduct; make decisions without appropriate human oversight if your application may have a consequential impact on any individual’s legal position, financial position, life opportunities, employment opportunities, human rights, or result in physical or psychological injury to an individual;
infer sensitive information about people without their explicit consent unless if used in a lawful manner by a law enforcement entity, court, or government official subject to judicial oversight in a jurisdiction that maintains a fair and independent judiciary; or
be used for chatbots that (i) are erotic, romantic, or used for companionship
purposes, or which are otherwise prohibited by this Code of Conduct; (ii) are personas of specific people without their explicit consent; (iii) claim to have special wisdom/insight/knowledge, unless very clearly labeled as being for entertainment purposes only; or (iv) enable end users to create their own chatbots without oversight.
We prohibit the use of our service for generating content that can inflict harm on individuals or society. Our content policies are intended to improve the safety of our platform.
These content requirements apply to the output of all models developed by OpenAI and hosted in Azure OpenAI, such as GPT-3, GPT-4, Codex models, and DALL·E 2, and includes content provided as input to the service and content generated as output from the service.
Exploitation and Abuse
Child sexual exploitation and abuse
Azure OpenAI Service prohibits content that describes, features, or promotes child sexual exploitation or abuse, whether or not prohibited by law. This includes sexual content involving a child or that sexualizes a child.
Grooming
Azure OpenAI Service prohibits content that describes or is used for purposes of grooming of children. Grooming is the act of an adult building a relationship with a child for the purposes of exploitation, especially sexual exploitation. This includes communicating with a child for the purpose of sexual exploitation, trafficking, or other forms of exploitation.
Non-consensual intimate content
Azure OpenAI Service prohibits content that describes, features, or promotes non-consensual intimate activity.
Sexual solicitation
Azure OpenAI Service prohibits content that describes, features, or promotes, or is used for, purposes of solicitation of commercial sexual activity and sexual services. This includes encouragement and coordination of real sexual activity.
Trafficking
Azure OpenAI Service prohibits content describing or used for purposes of human trafficking. This includes the recruitment of individuals, facilitation of transport, and payment for, and the promotion of, exploitation of people such as forced labor, domestic servitude, sexual slavery, forced marriages, and forced medical procedures.
Suicide and Self-Injury
Azure OpenAI Service prohibits content that describes, praises, supports, promotes, glorifies, encourages and/or instructs individual(s) on self-injury or to take their life.
Violent Content and Conduct
Graphic violence and gore
Azure OpenAI Service prohibits content that describes, features, or promotes graphic violence or gore.
Terrorism and Violent Extremism
Azure OpenAI Service prohibits content that depicts an act of terrorism; praises, or supports a terrorist organization, terrorist actor, or violent terrorist ideology; encourages terrorist activities; offers aid to terrorist organizations or terrorist causes; or aids in recruitment to a terrorist organization.
Violent Threats, Incitement, and Glorification of Violence
Azure OpenAI Service prohibits content advocating or promoting violence toward others through violent threats or incitement.
Harmful Content
Hate speech and discrimination
Azure OpenAI Service prohibits content that attacks, denigrates, intimidates, degrades, targets, or excludes individuals or groups on the basis of traits such as actual or perceived race, ethnicity, national origin, gender, gender identity, sexual orientation, religious affiliation, age, disability status, caste, or any other characteristic that is associated with systemic prejudice or marginalization.
Bullying and harassment
Azure OpenAI Service prohibits content that targets individual(s) or group(s) with threats, intimidation, insults, degrading or demeaning language or images, promotion of physical harm, or other abusive behavior such as stalking.
Deception, disinformation, and inauthentic activity
Azure OpenAI Service prohibits content that is intentionally deceptive and likely to adversely affect the public interest, including deceptive or untrue content relating to health, safety, election integrity, or civic participation. Azure OpenAI Service also prohibits inauthentic interactions, such as fake accounts, automated inauthentic activity, impersonation to gain unauthorized information or privileges, and claims to be from any person, company, government body, or entity without explicit permission to make that representation.
Active malware or exploits
Content that directly supports unlawful active attacks or malware campaigns that cause technical harms, such as delivering malicious executables, organizing denial of service
attacks, or managing command and control servers.
Additional content policies
We prohibit the use of our Azure OpenAI Service for scenarios in which the system is likely to generate undesired content due to limitations in the models or scenarios in which the system cannot be applied in a way that properly manages potential negative consequences to people and society. Without limiting the foregoing restriction, Microsoft reserves the right to revise and expand the above Content requirements to address specific harms to people and society.
This includes prohibiting content that is sexually graphic, including consensual pornographic content and intimate descriptions of sexual acts, as well as content that may influence the political process, such as an election, passage of legislation, and content for campaigning purposes.
We may at times limit our service's ability to respond to particular topics, such as probing for personal information or seeking opinions on sensitive topics or current events.
We prohibit the use of Azure OpenAI Service for activities that significantly harm other individuals, organizations, or society, including but not limited to use of the service for purposes in conflict with the applicable Azure Legal Terms and the Microsoft Product Terms .
If Azure OpenAI Service outputs problematic content that you believe should have been filtered, report it at cscraireport@microsoft.com.
Limited access to Azure OpenAI Service Transparency note for Azure OpenAI Service
Data, privacy, and security for Azure OpenAI Service
Article • 04/05/2023
This article provides details regarding how data provided by you to the Azure OpenAI service is processed, used, and stored. Azure OpenAI stores and processes data to provide the service and to monitor for uses that violate the applicable product terms. Please also see the Microsoft Products and Services Data Protection Addendum , which governs data processing by the Azure OpenAI Service except as otherwise provided in the applicable Product Terms .
Azure OpenAI was designed with compliance, privacy, and security in mind; however, the customer is responsible for its use and the implementation of this technology.
Azure OpenAI processes the following types of data:
Prompts and completions. Prompts are submitted by the user, and completions are output by the service, via the completions (/completions, /chat/completions) and embeddings operations.
Training & validation data. You can provide your own training data consisting of prompt-completion pairs for the purposes of fine-tuning an OpenAI model.
Results data from training process. After training a fine-tuned model, the service will output meta-data on the job which includes tokens processed and validation scores at each step.
The diagram below illustrates how your data is processed. This diagram covers three different types of processing:
How the Azure OpenAI Service creates a fine-tuned (custom) model with your training data;
How the Azure OpenAI Service processes your text prompts to generate completions and embeddings results; and
How the Azure OpenAI Service and Microsoft personnel analyze prompts and completions for abuse, misuse or harmful content generation, or for debugging purposes in the event of a failure.
Training data for purposes of fine-tuning an OpenAI model
The training data (prompt-completion pairs) submitted to the Fine-tunes API through the Azure OpenAI Studio is pre-processed using automated tools for quality checking including a data format check. The training data is then imported to the model training component on the Azure OpenAI platform. During the training process, the training data is decomposed into batches and used to modify the weights of the OpenAI models.
Training data provided by the customer is only used to fine-tune the customer's model and is not used by Microsoft to train or improve any Microsoft models.
Text prompts to generate completions and embeddings results
Once a model deployment (consisting of a customer's fine-tuned model or a base model endpoint) is provisioned in a customer's Azure OpenAI resource, the customer can submit text prompts to the model using our Completions or Embeddings operations
through the REST API, client libraries, or the Azure OpenAI Studio; the model generates text outputs (completions) that are returned through the API.
When data is submitted to the service, it is processed through our content filters as well as the specified OpenAI model. The content filtering models are run on both the prompt inputs as well as the generated completions.
No prompts or completions are stored in the model during these operations, and prompts and completions are not used to train, retrain or improve the models.
Preventing abuse and harmful content generation
The Azure OpenAI Service includes a content management system that works alongside the models to filter potentially harmful content. This system works by running both the input prompt and generated completion through an ensemble of classification models aimed at detecting misuse. If the system identifies harmful content, customers receive either an error on the API call if the prompt was deemed inappropriate or the finish_reason on the response will be content_filter to signify that some of the generation was filtered. Learn more about content filtering here. No prompts or completions are stored in these content classification models; prompts and completions are not used to train, retrain or improve the classification models.
In addition to synchronous content filtering, the Azure OpenAI Service stores prompts and completions from the service for up to thirty (30) days to monitor for content and/or behaviors that suggest use of the service in a manner that may violate applicable product terms. Authorized Microsoft employees may review prompt and completion data that has triggered our automated systems to investigate and verify potential abuse. For customers who have deployed Azure OpenAI Service in the European Economic Area, the authorized Microsoft employees will be located in the European Economic Area.
In the event of a confirmed policy violation, a customer may be asked to take immediate action to remediate the issue to and to prevent further abuse. Failure to address the issue may result in suspension or termination of Azure OpenAI resource access.
Customers may request to modify content filtering and/or abuse monitoring by submitting the form here . If a customer is approved and remains in compliance with all requirements to modify abuse monitoring, then prompts and completions are not stored.
Training, validation, and training results data. The Files API allows customers to upload their training data for the purpose of fine-tuning a model. This data is stored in Azure Storage, encrypted at rest by Microsoft Managed keys, within the same region as the resource and logically isolated with their Azure subscription and API Credentials. Uploaded files can be deleted by the user via the DELETE API operation.
Fine-tuned OpenAI models. The Fine-tunes API allows customers to create their own fine-tuned version of the OpenAI models based on the training data that they have uploaded to the service via the Files APIs. The trained fine-tuned models are stored in Azure Storage in the same region, encrypted at rest and logically isolated with their Azure subscription and API credentials. Fine-tuned models can be deleted by the user by calling the DELETE API operation.
Prompts and completions. The prompts and completions data may be temporarily stored by the Azure OpenAI Service in the same region as the resource for up to 30 days. This data is encrypted and is only accessible to authorized Microsoft employees for (1) debugging purposes in the event of a failure, and (2) investigating patterns of abuse and misuse to determine if the service is being used in a manner that violates the applicable product terms. Note: When a customer is approved for modified abuse monitoring, prompts and completions data are not stored, and thus Microsoft employees have no access to the data.
To learn more about Microsoft's privacy and security commitments visit the Microsoft Trust Center .
Can a customer opt out of the logging and human review process?
Some customers may want to use the Azure OpenAI Service for a use case that involves the processing of sensitive, highly confidential, or legally-regulated input data but where the likelihood of harmful outputs and/or misuse is low. These customers may conclude that they do not want or do not have the right to permit Microsoft to process such data for abuse detection, as described above, due to their internal policies or applicable legal regulations. To address these concerns, Microsoft allows customers who meet additional
Limited Access eligibility criteria and attest to specific use cases to apply to modify the Azure OpenAI content management features.
If Microsoft approves a customer's request to modify abuse monitoring, then Microsoft does not store any prompts and completions associated with the approved Azure subscription for which abuse monitoring is configured off. In this case, because no prompts and completions are stored at rest in the Service Results Store, the human review process is not possible and is not performed.
Customers can apply for modified abuse monitoring here: https://aka.ms/oai/modifiedaccess .
How can a customer verify if logging for abuse monitoring is off?
There are two ways for an approved customer to verify that logging for abuse monitoring has been turned off in their approved Azure subscription: (1) Azure portal or
(2) Azure CLI (or any MGMT API).
7 Note
Note: The value of "false" for the "ContentLogging" attribute appears only if logging is turned off. Otherwise, this property will not appear in either Azure portal
or Azure CLI's output. If a customer wants to verify if the logging is on, they would need to raise a support ticket.
Prerequisites
Sign into Azure
Select the Azure Subscription which hosts the Azure OpenAI Service resource. Navigate to the "Overview" page of the Azure OpenAI Service resource.
Logging Status Verification via the Azure portal
Go to the resource Overview page
Click on the 'JSON view' link on the top right corner as shown in the image below
There will be a value in the Capabilities list called 'ContentLogging' which will appear and be set to FALSE when logging for abuse monitoring is off.
JSON | |
... | |
{ | |
"name":"ContentLogging", | |
"value":"false" | |
} | |
... | |
Logging Status Verification via the Azure CLI (or MGMT API)
Execute the following command in Azure CLI to see the same JSON data as shown in the Azure portal above.
az cognitiveservices account show -n resource\_name -g resource\_group |
Is customer data logged with content filtering?
No, content filtering functions differently than abuse monitoring, and logging or storage of data is not needed. Content filtering works by applying algorithmic detection to the prompts and completions at inference time to determine if content should be filtered.
No data is stored at rest as part of this process, and thus there is no human review of prompts and/or completions that have been subject to filtering. You can learn more about content filtering in the service here: Azure OpenAI Service content filtering -Azure OpenAI | Microsoft Learn.
You can also apply to modify Azure OpenAI content filtering here: Azure OpenAI Limited Access Review: Modified Content Filters and Abuse Monitoring (microsoft.com) . As described above, although customers use the same form to request modification of content filtering and/or abuse monitoring, the impact on data logging and access differs depending on which aspects of the content management system are modified.
How can a customer verify if content filtering is off?
If a customer wants to verify that content filtering for their approved subscription ID(s) has been configured off, they can create a support ticket in the Azure Portal in one of two ways:
From the main portal dashboard in Help + Support in the side navigation bar
Within the Azure OpenAI resource itself in New Support Request in the side navigation bar
What is covered by customer managed key encryption?
Customer-managed keys (CMK), also known as Bring your own key (BYOK), offer greater flexibility to create, rotate, disable, and revoke access controls. You can also audit the encryption keys used to protect your data. CMK encrypts all customer data stored at rest in the Azure OpenAI Service (such as data uploaded for fine-tuning) except for data logged for 30 days as described above. (Learn more: Azure OpenAI Service encryption of data at rest - Azure Cognitive Services | Microsoft Learn)
What happens if Microsoft needs access to my data?
Most operations, support, and troubleshooting performed by Microsoft personnel and sub-processors do not require access to customer data. In those rare circumstances where such access is required, Customer Lockbox for Microsoft Azure provides an interface for customers to review and approve or reject customer data access requests. Customer Lockbox is used in cases where a Microsoft engineer needs to access customer data, whether in response to a customer-initiated support ticket or a problem identified by Microsoft. In the Azure OpenAI Service, Customer Lockbox applies to all customer data stored by the service (such as data uploaded for fine-tuning) except for the prompts and completions logged and accessed for the purposes of abuse monitoring as described above.
You can learn more about customer lockbox here: Customer Lockbox for Microsoft Azure | Microsoft Learn
Is customer data processed by Azure OpenAI sent to OpenAI?
No. Microsoft hosts the OpenAI models within our Azure infrastructure, and all customer data sent to Azure OpenAI remains within the Azure OpenAI service.
Is customer data used to train the OpenAI models?
No. We do not use customer data to train, retrain or improve the models in the Azure OpenAI Service.
Limited access to Azure OpenAI Service
Code of conduct for Azure OpenAI Service integrations Transparency note and use cases for Azure OpenAI Service Characteristics and limitations for Azure OpenAI Service
Report abuse of Azure OpenAI Service through the Report Abuse Portal Report problematic content to cscraireport@microsoft.com
Article • 04/06/2023
This article provides details on the inference REST API endpoints for Azure OpenAI.
Azure OpenAI provides two methods for authentication. you can use either API Keys or Azure Active Directory.
API Key authentication: For this type of authentication, all API requests must
api-key
include the API Key in the HTTP header. The Quickstart provides guidance
for how to make calls with this type of authentication.
Azure Active Directory authentication: You can authenticate an API call using an Azure Active Directory token. Authentication tokens are included in a request as
Authorization
the header. The token provided must be preceded by Bearer , for
example Bearer YOUR_AUTH_TOKEN . You can read our how-to guide on authenticating with Azure Active Directory.
REST API versioning
api-version
The service APIs are versioned using the
follow the YYYY-MM-DD date structure. For example:
query parameter. All versions
HTTP |
POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/completions?api-version=2022-12-01 |
With the Completions operation, the model will generate one or more predicted completions based on a provided prompt. The service can also return the probabilities of alternative tokens at each position.
Create a completion
HTTP |
POST https://{your-resource- name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-version={api-version} |
Path parameters
Parameter Type Required? Description |
your-resource- string Required The name of your Azure OpenAI Resource. name |
deployment-id string Required The deployment name you chose when you deployed the model. |
api-version string Required The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-03-15-preview
2022-12-01
Request body
Parameter | Type | Required? | Default | Description |
prompt | string or array | Optional | <\|endoftext\|> | The prompt(s) to generate completions for, encoded as a string, a list of strings, or a list of token lists. Note that <\|endoftext\|> is the document separator that the model sees during training, so if a prompt isn't specified the model will generate as if from the beginning of a new document. |
Parameter | Type | Required? | Default | Description |
max_tokens | integer | Optional | 16 | The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096). |
temperature | number | Optional | 1 | What sampling temperature to use, between 0 and 2. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 ( argmax sampling ) for ones with a well-defined answer. We generally recommend altering this or top_p but not both. |
top_p | number | Optional | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
Parameter | Type | Required? | Default | Description |
logit_bias | map | Optional | null | Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated. |
user | string | Optional | A unique identifier representing your end-user, which can help monitoring and detecting abuse | |
n | integer | Optional | 1 | How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. |
stream | boolean | Optional | False | Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. |
Parameter | Type | Required? | Default | Description |
logprobs | integer | Optional | null | Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. This parameter cannot be used with gpt-35-turbo . |
suffix | string | Optional | null | The suffix that comes after a completion of inserted text. |
echo | boolean | Optional | False | Echo back the prompt in addition to the completion. This parameter cannot be used with gpt-35-turbo . |
stop | string or array | Optional | null | Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence. |
presence_penalty | number | Optional | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
frequency_penalty | number | Optional | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
Parameter | Type | Required? | Default | Description |
best_of | integer | Optional | 1 | Generates best_of completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. This parameter cannot be used with gpt-35-turbo . |
Example request
Console |
curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/completions?api-version=2022-12-01\ -H "Content-Type: application/json" \ -H "api-key: YOUR_API_KEY" \ -d "{ \"prompt\": \"Once upon a time\", \"max_tokens\": 5 }" |
Example response
JSON |
{ "id": "cmpl-4kGh7iXtjW4lc9eGhff6Hp8C7btdQ", "object": "text_completion", "created": 1646932609, "model": "ada", "choices": [ { "text": ", a dark line crossed", |
"index": 0, "logprobs": null, "finish_reason": "length" } ] } |
In the example response,
equals stop . If
equals
finish_reason
finish_reason
content_filter
consult our content filtering guide to understand why this is occurring.
Get a vector representation of a given input that can be easily consumed by machine learning models and other algorithms.
7 Note
We currently do not support batching of embeddings into a single API call. If you receive the error InvalidRequestError: Too many inputs. The max number of inputs is 1. We hope to increase the number of inputs per request soon. , this typically occurs when an array of embeddings is attempted to be passed as a batch rather than a single string. The string can be up to 8191 tokens in length when using the text-embedding-ada-002 (Version 2) model.
Create an embedding
HTTP |
POST https://{your-resource- name}.openai.azure.com/openai/deployments/{deployment-id}/embeddings?api-version={api-version} |
Path parameters
Parameter Type Required? Description |
your- string Required The name of your Azure OpenAI Resource. resource-name |
deployment- string Required The name of your model deployment. You're required to first id deploy a model before you can make calls |
Parameter | Type | Required? | Description |
api-version | string | Required | The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-03-15-preview
2022-12-01
Request body
Parameter | Type | Required? | Default | Description |
input | string | Yes | N/A | Input text to get embeddings for, encoded as a string. The number of input tokens varies depending on what model you are using. Unless you're embedding code, we suggest replacing newlines (\n) in your input with a single space, as we have observed inferior results when newlines are present. |
user | string | No | Null | A unique identifier representing for your end-user. This will help Azure OpenAI monitor and detect abuse. Do not pass PII identifiers instead use pseudoanonymized values such as GUIDs |
Example request
Console |
curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/embeddings?api-version=2022-12-01 \ -H "Content-Type: application/json" \ -H "api-key: YOUR_API_KEY" \ -d "{\"input\": \"The food was delicious and the waiter...\"}" |
Example response
JSON |
{ "object": "list", "data": [ { |
"object": "embedding", "embedding": [ 0.018990106880664825, -0.0073809814639389515, .... (1024 floats total for ada) 0.021276434883475304, ], "index": 0 } ], "model": "text-similarity-babbage:001" } |
api-version=2023-03-15-
Create completions for chat messages with the ChatGPT (preview) and GPT-4 (preview) models. Chat completions are currently only available with
preview .
Create chat completions
HTTP |
POST https://{your-resource- name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions? api-version={api-version} |
Path parameters
Parameter | Type | Required? | Description |
your-resource-name | string | Required | The name of your Azure OpenAI Resource. |
deployment-id | string | Required | The name of your model deployment. You're required to first deploy a model before you can make calls |
api-version | string | Required | The API version to use for this operation. This follows the YYYY-MM-DD format. |
Supported versions
2023-03-15-preview
Example request
Console |
curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYME NT_NAME/chat/completions?api-version=2023-03-15-preview \ -H "Content-Type: application/json" \ -H "api-key: YOUR_API_KEY" \ -d '{"messages":[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},{"role": "user", "content": "Do other Azure Cognitive Services support this too?"}]}' |
Example response
Console |
{"id":"chatcmpl-6v7mkQj980V1yBec6ETrKPRqFjNw9", "object":"chat.completion","created":1679072642, "model":"gpt-35-turbo", "usage":{"prompt_tokens":58, "completion_tokens":68, "total_tokens":126}, "choices":[{"message":{"role":"assistant", "content":"Yes, other Azure Cognitive Services also support customer managed keys. Azure Cognitive Services offer multiple options for customers to manage keys, such as using Azure Key Vault, customer-managed keys in Azure Key Vault or customer-managed keys through Azure Storage service. This helps customers ensure that their data is secure and access to their services is controlled."},"finish_reason":"stop","index":0}]} |
finish_reason
In the example response,
equals stop . If
equals
finish_reason
content_filter
consult our content filtering guide to understand why this is occurring.
Output formatting adjusted for ease of reading, actual output is a single block of text without line breaks.
Parameter | Type | Required? | Default | Description |
messages | array | Required | The messages to generate chat completions for, in the chat format. | |
Parameter | Type | Required? | Default | Description |
temperature | number | Optional | 1 | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.\nWe generally recommend altering this or top_p but not both. |
n | integer | Optional | 1 | How many chat completion choices to generate for each input message. |
stream | boolean | Optional | false | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message." |
stop | string or array | Optional | null | Up to 4 sequences where the API will stop generating further tokens. |
max_tokens | integer | Optional | inf | The maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens). |
presence_penalty | number | Optional | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
frequency_penalty | number | Optional | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
Parameter | Type | Required? | Default | Description |
logit_bias | object | Optional | null | Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. |
user | string | Optional | A unique identifier representing your end-user, which can help Azure OpenAI to monitor and detect abuse. |
Azure OpenAI is deployed as a part of the Azure Cognitive Services. All Cognitive Services rely on the same set of management APIs for creation, update and delete operations. The management APIs are also used for deploying models within an OpenAI resource.
Management APIs reference documentation
Learn about managing deployments, models, and fine-tuning with the REST API. Learn more about the underlying models that power Azure OpenAI.
Reference
Service: Cognitive Services API Version: 2022-12-01
Cancels the processing of the fine-tune job specified by the given fine-tune-id. | |
Creates a job that fine-tunes a specified model from a given training file. Response includes details of the enqueued job including job status and hyper paramete... | |
Deletes the fine-tune job specified by the given fine-tune-id. | |
Gets details for a single fine-tune job specified by the given fine-tune-id. The details contain the base model, training and validation files, hyper parameters,... | |
Gets the events for the fine-tune job specified by the given fine-tune-id. Events are created when the job status changes, e.g. running or complete, and when res... | |
Gets a list of all fine-tune jobs owned by the Azure OpenAI resource. The details that are returned for each fine-tune job contain besides its identifier the base... |
Deployments - Create Or Update
Reference
Service: Cognitive Services API Version: 2021-10-01
Update the state of specified deployments associated with the Cognitive Services account.
HTTP |
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Cognitiv eServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2021-10-01 |
URI Parameters
Name | In | Required | Type | Description |
accountName | path | True | string | The name of Cognitive Services account. Regex pattern: ^[a-zA-Z0-9][a-zA-Z0-9_.-]*$ |
deploymentName | path | True | string | The name of the deployment associated with the Cognitive Services Account |
resourceGroupName | path | True | string | The name of the resource group. The name is case insensitive. |
subscriptionId | path | True | string | The ID of the target subscription. |
api-version | query | True | string | The API version to use for this operation. |
Request Body
Name | Type | Description |
properties | DeploymentProperties | Properties of Cognitive Services account deployment. |
Responses
Name | Type | Description |
200 OK | Deployment | Create/Update the deployment successfully. |
201 Created | Deployment | Create the deployment successfully. |
Other Status Codes | ErrorResponse | Error response describing why the operation failed. |
Examples PutDeployment Sample Request
HTTP
HTTP PUT https://management.azure.com/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.Cognitiv eServices/accounts/accountName/deployments/deploymentName?api-version=2021-10-01 { "properties": { "model": { "format": "OpenAI", "name": "ada", "version": "1" }, "scaleSettings": { "scaleType": "Manual", "capacity": 1 } } } |
Sample Response
Status code: 200
JSON |
{ "id": "/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/ deployments/deploymentName", "name": "deploymentName", "type": "Microsoft.CognitiveServices/accounts/deployments", "properties": { "model": { "format": "OpenAI", "name": "ada", "version": "1" }, "scaleSettings": { "scaleType": "Manual", "capacity": 1 }, "provisioningState": "Succeeded" } } |
Status code: 201
JSON |
{ "id": "/subscriptions/subscriptionId/resourceGroups/resourceGroupName/providers/Microsoft.CognitiveServices/accounts/accountName/ deployments/deploymentName", "name": "deploymentName", "type": "Microsoft.CognitiveServices/accounts/deployments", "properties": { "model": { "format": "OpenAI", "name": "ada", "version": "1" }, "scaleSettings": { "scaleType": "Manual", "capacity": 1 }, "provisioningState": "Accepted" } } |
Definitions
Name Description |
createdByType | The type of identity that created the resource. |
Deployment | Cognitive Services account deployment. |
DeploymentModel | Properties of Cognitive Services account deployment model. |
DeploymentProperties | Properties of Cognitive Services account deployment. |
DeploymentProvisioningState | Gets the status of the resource at the time the operation was called. |
DeploymentScaleSettings | Properties of Cognitive Services account deployment model. |
DeploymentScaleType | Deployment scale type. |
ErrorAdditionalInfo | The resource management error additional info. |
ErrorDetail | The error detail. |
ErrorResponse | Error response |
systemData | Metadata pertaining to creation and last modification of the resource. |
createdByType
The type of identity that created the resource.
Name Type Description |
Application string |
Key string |
ManagedIdentity string |
User string |
Deployment
Cognitive Services account deployment.
Name | Type | Description |
etag | string | Resource Etag. |
id | string | Fully qualified resource ID for the resource. Ex - /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType} |
name | string | The name of the resource |
properties | Deployment Properties | Properties of Cognitive Services account deployment. |
systemData | systemData | Metadata pertaining to creation and last modification of the resource. |
type | string | The type of the resource. E.g. "Microsoft.Compute/virtualMachines" or "Microsoft.Storage/storageAccounts" |
DeploymentModel
Properties of Cognitive Services account deployment model.
Name | Type | Description |
format | string | Deployment model format. |
name | string | Deployment model name. |
version | string | Deployment model version. |
DeploymentProperties
Properties of Cognitive Services account deployment.
Name | Type | Description |
model | Deployment Model | Properties of Cognitive Services account deployment model. |
provisioningState | Deployment ProvisioningState | Gets the status of the resource at the time the operation was called. |
scaleSettings | Deployment ScaleSettings | Properties of Cognitive Services account deployment model. |
DeploymentProvisioningState
Gets the status of the resource at the time the operation was called.
Name Type Description |
Accepted string |
Creating string |
Deleting string |
Failed string |
Moving string |
Succeeded string |
DeploymentScaleSettings
Properties of Cognitive Services account deployment model.
Name | Type | Description |
capacity | integer | Deployment capacity. |
scaleType | Deployment ScaleType | Deployment scale type. |
DeploymentScaleType
Deployment scale type.
Name Type Description |
Manual string |
Standard string |
ErrorAdditionalInfo
The resource management error additional info.
Name | Type | Description |
info | object | The additional info. |
type | string | The additional info type. |
ErrorDetail
The error detail.
Name | Type | Description |
additionalInfo | ErrorAdditional Info[] | The error additional info. |
code | string | The error code. |
details | ErrorDetail[] | The error details. |
message | string | The error message. |
target | string | The error target. |
ErrorResponse
Error response
Name | Type | Description |
error | ErrorDetail | The error object. |
systemData
Metadata pertaining to creation and last modification of the resource.
Name | Type | Description |
createdAt | string | The timestamp of resource creation (UTC). |
createdBy | string | The identity that created the resource. |
createdByType | createdByType | The type of identity that created the resource. |
lastModifiedAt | string | The timestamp of resource last modification (UTC) |
lastModifiedBy | string | The identity that last modified the resource. |
lastModifiedByType | createdByType | The type of identity that last modified the resource. |
Reference
) Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
The representation of a single completions result for a chat completions request. | |
A representation of a full response to a chat completions request. | |
The configuration information used for a chat completions request. | |
A single, role-attributed message within a chat completion interaction. | |
Choice model within completion response. | |
Expected response schema to completion request. | |
LogProbs model within completion choice. | |
Post body schema to create a prompt completion from a deployment. | |
Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers. | |
Expected response schema to embeddings object list item request. | |
Expected response schema to embeddings request. | |
Schema to create a prompt completion from a deployment. | |
Measurment of the amount of tokens used in this request and response. | |
OpenAIClient Azure OpenAI APIs for completions and search. |
OpenAIClientOptions Client options for OpenAIClient. |
ChatRole A description of the intended purpose of a message within a chat completions interaction. |
OpenAIClientOptions.Service The version of the service to use. Version |
Reference
Package: com.azure.ai.openai
Maven Artifact: com.azure:azure-ai-openai:1.0.0-beta.1
Package containing the classes for OpenAI. Azure OpenAI APIs for completions and search.
OpenAIAsyncClient Initializes a new instance of the asynchronous OpenAIClient type. |
OpenAIClient Initializes a new instance of the synchronous OpenAIClient type. |
OpenAIClientBuilder A builder for creating a new instance of the OpenAIClient type. |
OpenAIServiceVersion Service version of OpenAIClient. |
Reference
AzureKeyCredential A static-key-based credential that supports updating the
underlying key value.
OpenAIClient A client for interacting with Azure OpenAI.
The client needs the endpoint of an OpenAI resource and an authentication method such as an API key or token. The API key and endpoint can be found in the OpenAI resource page. They will be located in the resource's Keys and Endpoint page.
Examples for authentication:
API Key
JavaScript
import { OpenAIClient } from "@azure/openai"; import { AzureKeyCredential } from
"@azure/core-auth";
const endpoint = "<azure endpoint>";
const credential = new AzureKeyCredential("<api key>");
const client = new OpenAIClient(endpoint, credential);
Azure Active Directory
JavaScript
import { OpenAIClient } from "@azure/openai"; import { DefaultAzureCredential } from
"@azure/identity";
const endpoint = "<azure endpoint>"; const credential = new
DefaultAzureCredential();
const client = new OpenAIClient(endpoint, credential); |
OpenAIKeyCredential The OpenAIKeyCredential class represents an OpenAI API key and is used to authenticate into an OpenAI client for an OpenAI endpoint. |
The representation of a single prompt completion as part of an overall chat completions request. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated. | |
Representation of the response data from a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. | |
A single, role-attributed message within a chat completion interaction. | |
The representation of a single prompt completion as part of an overall completions request. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated. | |
Representation of the response data from a completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. | |
Representation of a log probabilities model for a completions generation. | |
Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers. | |
Representation of a single embeddings relatedness comparison. | |
Representation of the response data from an embeddings request. Embeddings measure the relatedness of text strings and are commonly used for search, clustering, recommendations, and other similar scenarios. | |
Measurement of the amount of tokens used in this request and response. | |
ChatRole "system", "assistant", "user" |
CompletionsFinishReason "stopped", "tokenLimitReached", "contentFiltered" |
Article • 07/22/2022 • 2 minutes to read
Are you just starting to explore the functionality of Azure Cognitive Services? Perhaps you are implementing a new feature in your application. Or after using the service, do you have suggestions on how to improve it? Here are options for where you can get support, stay up-to-date, give feedback, and report bugs for Cognitive Services.
Explore the range of Azure support options and choose the plan that best fits,
whether you're a developer just starting your cloud journey or a large organization deploying business-critical, strategic applications. Azure customers can create and manage support requests in the Azure portal.
Azure portal for the United States government
For quick and reliable answers on your technical product questions from Microsoft Engineers, Azure Most Valuable Professionals (MVPs), or our expert community, engage with us on Microsoft Q&A, Azure's preferred destination for community support.
If you can't find an answer to your problem using search, submit a new question to Microsoft Q&A. Use one of the following tags when you ask your question:
Vision
Computer Vision Custom Vision Face
Language
Language Understanding (LUIS) QnA Maker
Speech
Decision
Anomaly Detector Content Moderator Metrics Advisor Personalizer
Azure OpenAI
For answers on your developer questions from the largest community developer ecosystem, ask your question on Stack Overflow.
If you do submit a new question to Stack Overflow, please use one or more of the following tags when you create the question:
Vision
Computer Vision Custom Vision Face
Language
QnA Maker Language service Translator
Speech
Decision
Anomaly Detector Content Moderator Metrics Advisor Personalizer
Azure OpenAI
To request new features, post them on https://feedback.azure.com . Share your ideas for making Cognitive Services and its APIs work better for the applications you develop.
Vision
Computer Vision Custom Vision Face
Language
Language Understanding (LUIS) QnA Maker
Speech
Decision
Anomaly Detector Content Moderator Metrics Advisor Personalizer
Staying informed about features in a new release or news on the Azure blog can help you find the difference between a programming error, a service bug, or a feature not yet available in Cognitive Services.
Learn more about product updates, roadmap, and announcements in Azure Updates .
News about Cognitive Services is shared in the Azure blog . Join the conversation on Reddit about Cognitive Services.
What are Azure Cognitive Services?
Documentation
Summarization language support - Azure Cognitive Services
Learn about which languages are supported by document summarization.
Fine Tunes - Create - REST API (Azure Cognitive Services)
Creates a job that fine-tunes a specified model from a given training file. Response includes details of the enqueued job including job status and hyper paramet
Summarize text with the conversation summarization API - Azure Cognitive Services
This article will show you how to summarize chat logs with the conversation summarization API.
Summarize text with the extractive summarization API - Azure Cognitive Services
This article will show you how to summarize text with the extractive summarization API.
What is document and conversation summarization (preview)? - Azure Cognitive Services
Learn about summarizing text.
Fine Tunes - REST API (Azure Cognitive Services)
Learn more about [Cognitive Services Fine Tunes Operations]. How to [Cancel,Create,Delete,Get,Get Events,List].
Quickstart: Use Document Summarization (preview) - Azure Cognitive Services
Use this quickstart to start using Document Summarization.
Deployments - Create - REST API (Azure Cognitive Services)
Learn more about Cognitive Services service - Creates a new deployment for the Azure OpenAI resource according to the given specification.
Show 5 more
Training
Learning path
Provision and manage Azure Cognitive Services - Training
Provision and manage Azure Cognitive Services
Certification
Microsoft Certified: Azure Developer Associate - Certifications
Azure developers design, build, test, and maintain cloud applications and services.